Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-12 Thread Casey Leedom
| From: Ding Tianhong 
| Sent: Wednesday, July 12, 2017 6:18 PM
| 
| If no other more suggestion, I will send a new version and remove the
| enable_pcie_relaxed_ordering(), thanks.  :)

   Sounds good to me.  (And sorry for forgetting to justify that last message.
I hate working with web-based email agents.)

Casey

Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling

2017-07-12 Thread Joe Stringer
On 12 July 2017 at 17:44, Jiannan Ouyang  wrote:
> This patch series augmented the existing GTP module to support flow
> based GTP tunneling and modified the openvswitch datapath to support the
> GTP vport type.
>
> A flow based GTP net device enables that,
> 1) on the RX path, the outer (IP/UDP/GTP) header information could to be
> stored in the metadata_dst struct, and embedded into the skb.
> 2) on the TX path, packets are encapsulated following instructions in
> the metadata_dst field of the skb.
>
> A flow based GTP net device can be integrated with Open vSwitch, which
> allows SDN controllers to program GTP tunnels via Open vSwitch.
>
> Open vSwitch changes are based on patch set
> [PATCH] Add GTP vport based on upstream datapath
>
> Example usage with OVS:
>
> ovs-vsctl add-port br0 gtp-vport -- set interface gtp-vport \
> ofport_request=2 type=gtp option:remote_ip=flow options:key=flow
>
> ovs-ofctl add-flow br0
> "in_port=2,tun_src=192.168.60.141,tun_id=123, \
> actions=set_field:02:00:00:00:00:00->eth_src, \
> set_field:ff:ff:ff:ff:ff:ff->eth_dst,LOCAL"
>
> ovs-ofctl add-flow br0 \
> "in_port=LOCAL,actions=set_tunnel:888, \
> set_field:192.168.60.141->tun_dst,2"
>
> arp -s 10.1.1.122 02:00:00:00:00:00
>
> Jiannan Ouyang (3):
>   gtp: refactor to support flow-based gtp encap and decap
>   gtp: Support creating flow-based gtp net_device
>   openvswitch: Add GPRS Tunnel Protocol (GTP) vport support

Hi Jiannan,

net-next is closed, Dave won't accept patches at this time.

Some brief feedback in regards to patch #3, the preference these days
is for OVS userspace to use rtnetlink to configure devices in
COLLECT_METADATA mode, then attach those devices as regular
vport-netdev device type to OVS kernel datapath. I think that should
mean that no kernel changes to openvswitch are required for providing
GTP vports. Instead of this patch it would require something similar
to the IFLA_GRE_COLLECT_METADATA flag which GRE has, but for the GTP
devices. The latest OVS master now supports configuring devices in
this way, perhaps you could take a look at OVS tree's
lib/dpif-netlink-rtnl.c to see how other tunnel devices are configured
and see if that makes sense for GTP as well?

Cheers,
Joe


Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-12 Thread Ding Tianhong


On 2017/7/13 8:52, Casey Leedom wrote:
>   Sorry again for the delay.  This time at least partially caused by a 
> Chelsio-internal Customer Support request to simply disable Relaxed Ordering 
> entirely due to the performance issues with our 100Gb/s product and 
> relatively recent Intel Root Complexes.  Our Customer Support people are 
> tired of asking customers to try turning off Relaxed Ordering. (sigh)
> 
>   So, first off, I've mentioned a couple of times that the current cxgb4 
> driver hardwired the PCIe Capability Device Control[Relaxed Ordering Enable] 
> on.  Here's the code which does it:
> 
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4657:
> 
> static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
> {
> pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
> PCI_EXP_DEVCTL_RELAX_EN);
> }

we should remove it.

> 
> This is called from the PCI Probe() routine init_one() later in that file.  I 
> just wanted to make sure people knew about this.  Obviously given our current 
> very difficult thread, this would either need to be diked out or changed or a 
> completely different mechanism put in place.
> 
>   Second, just to make sure everyone's on the same page, the above simply 
> allows the device to send TLPs with the Relaxed Ordering Attribute.  It 
> doesn't cause TLPs to suddenly all be sent with RO set.  The use of Relaxed 
> Ordering is selective.  For instance, in our hardware we can configure the RX 
> Path to use RO on Ingress Packet Data delivery to Free List Buffers, but not 
> use RO for delivery of messages noting newly delivered Ingress Packet Data.  
> Doing this allows the destination PCIe target to [potentially] optimize the 
> DMA Writes to it based on local conditions (memory controller channel 
> availability, etc.), but ensure that the message noting newly delivered 
> Ingress Packet Data isn't processed till all of the preceding TLPs with RO 
> set containing Ingress Packet Data have been processed.  (This by the way is 
> the essence of the AMD A1100 ARM SoC bug: its Root Complex isn't obeying that 
> PCIe ordering rule.)
> 
>   Third, as noted above, I'm getting a lot of pressure to get this addressed 
> sooner than later, so I think that we should go with something fairly simple 
> along the lines that you guys are proposing and I'll stop whining about the 
> problem of needing to handle Peer-to-Peer with Relaxed Ordering while not 
> using it for deliveries to the Root Complex.  We can just wait for that 
> kettle of fish to explode on us and deal with the mess then.  (Hhmmm, the 
> mixed metaphor landed in an entirely different place than I originally 
> intended ... :-))
> 

Ok, we could fix them when we trigger this, I think it is not a big problem.

>   If we try to stick as closely to Ding's latest patch set as possible, then 
> we can probably just add the diff to remove the 
> enable_pcie_relaxed_ordering() code in cxgb4_main.c.
> 

If no other more suggestion, I will send a new version and remove the 
enable_pcie_relaxed_ordering(), thanks.  :)

Ding
> Casey
> .
> 



Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-12 Thread Casey Leedom
  Sorry again for the delay.  This time at least partially caused by a 
Chelsio-internal Customer Support request to simply disable Relaxed Ordering 
entirely due to the performance issues with our 100Gb/s product and relatively 
recent Intel Root Complexes.  Our Customer Support people are tired of asking 
customers to try turning off Relaxed Ordering. (sigh)

  So, first off, I've mentioned a couple of times that the current cxgb4 driver 
hardwired the PCIe Capability Device Control[Relaxed Ordering Enable] on.  
Here's the code which does it:

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4657:

static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
{
pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
}

This is called from the PCI Probe() routine init_one() later in that file.  I 
just wanted to make sure people knew about this.  Obviously given our current 
very difficult thread, this would either need to be diked out or changed or a 
completely different mechanism put in place.

  Second, just to make sure everyone's on the same page, the above simply 
allows the device to send TLPs with the Relaxed Ordering Attribute.  It doesn't 
cause TLPs to suddenly all be sent with RO set.  The use of Relaxed Ordering is 
selective.  For instance, in our hardware we can configure the RX Path to use 
RO on Ingress Packet Data delivery to Free List Buffers, but not use RO for 
delivery of messages noting newly delivered Ingress Packet Data.  Doing this 
allows the destination PCIe target to [potentially] optimize the DMA Writes to 
it based on local conditions (memory controller channel availability, etc.), 
but ensure that the message noting newly delivered Ingress Packet Data isn't 
processed till all of the preceding TLPs with RO set containing Ingress Packet 
Data have been processed.  (This by the way is the essence of the AMD A1100 ARM 
SoC bug: its Root Complex isn't obeying that PCIe ordering rule.)

  Third, as noted above, I'm getting a lot of pressure to get this addressed 
sooner than later, so I think that we should go with something fairly simple 
along the lines that you guys are proposing and I'll stop whining about the 
problem of needing to handle Peer-to-Peer with Relaxed Ordering while not using 
it for deliveries to the Root Complex.  We can just wait for that kettle of 
fish to explode on us and deal with the mess then.  (Hhmmm, the mixed metaphor 
landed in an entirely different place than I originally intended ... :-))

  If we try to stick as closely to Ding's latest patch set as possible, then we 
can probably just add the diff to remove the enable_pcie_relaxed_ordering() 
code in cxgb4_main.c.

Casey

[PATCH net-next v1 1/3] gtp: refactor to support flow-based gtp encap and decap

2017-07-12 Thread Jiannan Ouyang
If flow-based encap/decap is enabled, a separate code path is created for both
packet RX and TX. PDP contexts are not used in flow-based mode since
all metadata is maintained in metadata_dst:

- for RX, pdp lookup and ms check are bypassed, while metadata_dst is
  constructed and attached to the skb.

- for TX, pdp lookup is bypassed. Packets are encapsulated following
  instructions specified in metadata_dst.

Signed-off-by: Jiannan Ouyang 
---
 drivers/net/gtp.c | 162 ++
 1 file changed, 102 insertions(+), 60 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 1542e83..5a7b504 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,8 @@
 #include 
 #include 
 
+#define GTP_PDP_HASHSIZE 1024
+
 /* An active session for the subscriber. */
 struct pdp_ctx {
struct hlist_node   hlist_tid;
@@ -59,7 +62,7 @@ struct pdp_ctx {
struct in_addr  peer_addr_ip4;
 
struct sock *sk;
-   struct net_device   *dev;
+   struct net_device   *dev;
 
atomic_ttx_seq;
struct rcu_head rcu_head;
@@ -73,11 +76,15 @@ struct gtp_dev {
struct sock *sk1u;
 
struct net_device   *dev;
+   struct net  *net;
 
unsigned introle;
unsigned inthash_size;
struct hlist_head   *tid_hash;
struct hlist_head   *addr_hash;
+
+   unsigned intcollect_md;
+   struct ip_tunnel_info   info;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -184,22 +191,23 @@ static bool gtp_check_ms(struct sk_buff *skb, struct 
pdp_ctx *pctx,
return false;
 }
 
-static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
-   unsigned int hdrlen, unsigned int role)
+static int gtp_rx(struct gtp_dev *gtp, struct sk_buff *skb,
+ unsigned int hdrlen, struct sock *sk,
+ struct metadata_dst *tun_dst)
 {
struct pcpu_sw_netstats *stats;
 
-   if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
-   netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
-   return 1;
-   }
-
/* Get rid of the GTP + UDP headers. */
if (iptunnel_pull_header(skb, hdrlen, skb->protocol,
-!net_eq(sock_net(pctx->sk), 
dev_net(pctx->dev
+!net_eq(sock_net(sk), dev_net(gtp->dev
return -1;
 
-   netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
+   netdev_dbg(gtp->dev, "forwarding packet from GGSN to uplink\n");
+
+   if (tun_dst) {
+   skb_dst_set(skb, (struct dst_entry *)tun_dst);
+   netdev_dbg(gtp->dev, "attaching metadata_dst to skb\n");
+   }
 
/* Now that the UDP and the GTP header have been removed, set up the
 * new network header. This is required by the upper layer to
@@ -207,15 +215,16 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff 
*skb,
 */
skb_reset_network_header(skb);
 
-   skb->dev = pctx->dev;
+   skb->dev = gtp->dev;
 
-   stats = this_cpu_ptr(pctx->dev->tstats);
+   stats = this_cpu_ptr(gtp->dev->tstats);
u64_stats_update_begin(>syncp);
stats->rx_packets++;
stats->rx_bytes += skb->len;
u64_stats_update_end(>syncp);
 
netif_rx(skb);
+
return 0;
 }
 
@@ -244,7 +253,12 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct 
sk_buff *skb)
return 1;
}
 
-   return gtp_rx(pctx, skb, hdrlen, gtp->role);
+   if (!gtp_check_ms(skb, pctx, hdrlen, gtp->role)) {
+   netdev_dbg(gtp->dev, "No PDP ctx for this MS\n");
+   return 1;
+   }
+
+   return gtp_rx(gtp, skb, hdrlen, pctx->sk, NULL);
 }
 
 static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
@@ -253,6 +267,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct 
sk_buff *skb)
  sizeof(struct gtp1_header);
struct gtp1_header *gtp1;
struct pdp_ctx *pctx;
+   struct metadata_dst *tun_dst = NULL;
 
if (!pskb_may_pull(skb, hdrlen))
return -1;
@@ -280,13 +295,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
 
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
-   pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
-   if (!pctx) {
-   netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-   return 1;
+   if (ip_tunnel_collect_metadata() || gtp->collect_md) {
+   tun_dst = udp_tun_rx_dst(skb, gtp->sk1u->sk_family, TUNNEL_KEY,
+key32_to_tunnel_id(gtp1->tid), 0);
+   

[PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device

2017-07-12 Thread Jiannan Ouyang
Add the gtp_create_flow_based_dev() interface to create flow-based gtp
net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are
created and maintained in kernel.

Signed-off-by: Jiannan Ouyang 
---
 drivers/net/gtp.c | 213 +-
 include/net/gtp.h |   8 ++
 2 files changed, 217 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 5a7b504..09712c9 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -642,9 +642,94 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, 
struct net_device *dev)
return NETDEV_TX_OK;
 }
 
+static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
+static void gtp_hashtable_free(struct gtp_dev *gtp);
+static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]);
+
+static int gtp_change_mtu(struct net_device *dev, int new_mtu, bool strict)
+{
+   int max_mtu = IP_MAX_MTU - dev->hard_header_len - sizeof(struct iphdr)
+   - sizeof(struct udphdr) - sizeof(struct gtp1_header);
+
+   if (new_mtu < ETH_MIN_MTU)
+   return -EINVAL;
+
+   if (new_mtu > max_mtu) {
+   if (strict)
+   return -EINVAL;
+
+   new_mtu = max_mtu;
+   }
+
+   dev->mtu = new_mtu;
+   return 0;
+}
+
+static int gtp_dev_open(struct net_device *dev)
+{
+   struct gtp_dev *gtp = netdev_priv(dev);
+   struct net *net = gtp->net;
+   struct socket *sock1u;
+   struct socket *sock0;
+   struct udp_tunnel_sock_cfg tunnel_cfg;
+   struct udp_port_cfg udp_conf;
+   int err;
+
+   memset(_conf, 0, sizeof(udp_conf));
+
+   udp_conf.family = AF_INET;
+   udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
+   udp_conf.local_udp_port = htons(GTP1U_PORT);
+
+   err = udp_sock_create(gtp->net, _conf, );
+   if (err < 0)
+   return err;
+
+   udp_conf.local_udp_port = htons(GTP0_PORT);
+   err = udp_sock_create(gtp->net, _conf, );
+   if (err < 0)
+   return err;
+
+   memset(_cfg, 0, sizeof(tunnel_cfg));
+   tunnel_cfg.sk_user_data = gtp;
+   tunnel_cfg.encap_rcv = gtp_encap_recv;
+   tunnel_cfg.encap_destroy = gtp_encap_destroy;
+   tunnel_cfg.encap_type = UDP_ENCAP_GTP0;
+   setup_udp_tunnel_sock(net, sock0, _cfg);
+
+   tunnel_cfg.encap_type = UDP_ENCAP_GTP1U;
+
+   setup_udp_tunnel_sock(net, sock1u, _cfg);
+
+   sock_hold(sock0->sk);
+   sock_hold(sock1u->sk);
+
+   gtp->sk0 = sock0->sk;
+   gtp->sk1u = sock1u->sk;
+
+   return 0;
+}
+
+static int gtp_dev_stop(struct net_device *dev)
+{
+   struct gtp_dev *gtp = netdev_priv(dev);
+   struct sock *sk = gtp->sk1u;
+
+   udp_tunnel_sock_release(gtp->sk0->sk_socket);
+   udp_tunnel_sock_release(gtp->sk1u->sk_socket);
+
+   udp_sk(sk)->encap_type = 0;
+   rcu_assign_sk_user_data(sk, NULL);
+   sock_put(sk);
+
+   return 0;
+}
+
 static const struct net_device_ops gtp_netdev_ops = {
.ndo_init   = gtp_dev_init,
.ndo_uninit = gtp_dev_uninit,
+   .ndo_open   = gtp_dev_open,
+   .ndo_stop   = gtp_dev_stop,
.ndo_start_xmit = gtp_dev_xmit,
.ndo_get_stats64= ip_tunnel_get_stats64,
 };
@@ -672,10 +757,6 @@ static void gtp_link_setup(struct net_device *dev)
  sizeof(struct gtp0_header);
 }
 
-static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
-static void gtp_hashtable_free(struct gtp_dev *gtp);
-static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]);
-
 static int gtp_newlink(struct net *src_net, struct net_device *dev,
   struct nlattr *tb[], struct nlattr *data[],
   struct netlink_ext_ack *extack)
@@ -780,6 +861,130 @@ static struct rtnl_link_ops gtp_link_ops __read_mostly = {
.fill_info  = gtp_fill_info,
 };
 
+static void init_tnl_info(struct ip_tunnel_info *info, __u16 dst_port)
+{
+   memset(info, 0, sizeof(*info));
+   info->key.tp_dst = htons(dst_port);
+}
+
+static struct gtp_dev *gtp_find_flow_based_dev(
+   struct net *net)
+{
+   struct gtp_net *gn = net_generic(net, gtp_net_id);
+   struct gtp_dev *gtp, *t = NULL;
+
+   list_for_each_entry(gtp, >gtp_dev_list, list) {
+   if (gtp->collect_md)
+   t = gtp;
+   }
+
+   return t;
+}
+
+static int gtp_configure(struct net *net, struct net_device *dev,
+const struct ip_tunnel_info *info)
+{
+   struct gtp_net *gn = net_generic(net, gtp_net_id);
+   struct gtp_dev *gtp = netdev_priv(dev);
+   int err;
+
+   gtp->net = net;
+   gtp->dev = dev;
+
+   if (gtp_find_flow_based_dev(net))
+   return -EBUSY;
+
+   dev->netdev_ops = _netdev_ops;
+   dev->priv_destructor= 

[PATCH net-next v1 3/3] openvswitch: Add GPRS Tunnel Protocol (GTP) vport support

2017-07-12 Thread Jiannan Ouyang
Add OVS_VPORT_TYPE_GTP type and vport-gtp support.

Signed-off-by: Jiannan Ouyang 
---
 include/uapi/linux/openvswitch.h |   1 +
 net/openvswitch/Kconfig  |  10 +++
 net/openvswitch/Makefile |   1 +
 net/openvswitch/vport-gtp.c  | 144 +++
 4 files changed, 156 insertions(+)
 create mode 100644 net/openvswitch/vport-gtp.c

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4c..82b87b2 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -213,6 +213,7 @@ enum ovs_vport_type {
OVS_VPORT_TYPE_GRE,  /* GRE tunnel. */
OVS_VPORT_TYPE_VXLAN,/* VXLAN tunnel. */
OVS_VPORT_TYPE_GENEVE,   /* Geneve tunnel. */
+   OVS_VPORT_TYPE_GTP,  /* GTP tunnel. */
__OVS_VPORT_TYPE_MAX
 };
 
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index ce94729..d30d0ff 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -71,3 +71,13 @@ config OPENVSWITCH_GENEVE
  If you say Y here, then the Open vSwitch will be able create geneve 
vport.
 
  Say N to exclude this support and reduce the binary size.
+
+config OPENVSWITCH_GTP
+   tristate "Open vSwitch GTP tunneling support"
+   depends on OPENVSWITCH
+   depends on GTP
+   default OPENVSWITCH
+   ---help---
+ If you say Y here, then the Open vSwitch will be able create gtp 
vport.
+
+ Say N to exclude this support and reduce the binary size.
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 60f8090..d77fcc0 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -22,3 +22,4 @@ endif
 obj-$(CONFIG_OPENVSWITCH_VXLAN)+= vport-vxlan.o
 obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
 obj-$(CONFIG_OPENVSWITCH_GRE)  += vport-gre.o
+obj-$(CONFIG_OPENVSWITCH_GTP)  += vport-gtp.o
diff --git a/net/openvswitch/vport-gtp.c b/net/openvswitch/vport-gtp.c
new file mode 100644
index 000..ed736ef
--- /dev/null
+++ b/net/openvswitch/vport-gtp.c
@@ -0,0 +1,144 @@
+/*
+ * Copyright (c) 2017 Facebook, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "datapath.h"
+#include "vport.h"
+#include "vport-netdev.h"
+
+static struct vport_ops ovs_gtp_vport_ops;
+/**
+ * struct gtp_port - Keeps track of open UDP ports
+ * @dst_port: destination port.
+ */
+struct gtp_port {
+   u16 port_no;
+};
+
+static inline struct gtp_port *gtp_vport(const struct vport *vport)
+{
+   return vport_priv(vport);
+}
+
+static int gtp_get_options(const struct vport *vport,
+  struct sk_buff *skb)
+{
+   struct gtp_port *gtp_port = gtp_vport(vport);
+
+   if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, gtp_port->port_no))
+   return -EMSGSIZE;
+   return 0;
+}
+
+static struct vport *gtp_tnl_create(const struct vport_parms *parms)
+{
+   struct net *net = ovs_dp_get_net(parms->dp);
+   struct nlattr *options = parms->options;
+   struct gtp_port *gtp_port;
+   struct net_device *dev;
+   struct vport *vport;
+   struct nlattr *a;
+   u16 dst_port;
+   int err;
+
+   if (!options) {
+   err = -EINVAL;
+   goto error;
+   }
+
+   a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT);
+   if (a && nla_len(a) == sizeof(u16)) {
+   dst_port = nla_get_u16(a);
+   } else {
+   /* Require destination port from userspace. */
+   err = -EINVAL;
+   goto error;
+   }
+
+   vport = ovs_vport_alloc(sizeof(struct gtp_port),
+   _gtp_vport_ops, parms);
+   if (IS_ERR(vport))
+   return vport;
+
+   gtp_port = gtp_vport(vport);
+   gtp_port->port_no = dst_port;
+
+   rtnl_lock();
+   dev = gtp_create_flow_based_dev(net, parms->name,
+   NET_NAME_USER, dst_port);
+   if (IS_ERR(dev)) {
+   rtnl_unlock();
+   ovs_vport_free(vport);
+   return ERR_CAST(dev);
+   }
+
+   err = dev_change_flags(dev, dev->flags | IFF_UP);
+   if (err < 0) {
+   rtnl_delete_link(dev);
+   rtnl_unlock();
+   ovs_vport_free(vport);
+   goto error;
+   }
+
+   rtnl_unlock();
+   return vport;
+error:
+   return ERR_PTR(err);
+}
+
+static struct vport *gtp_create(const struct vport_parms *parms)
+{
+   struct vport *vport;
+
+   vport = 

[PATCH net-next v1 0/3] Flow Based GTP Tunneling

2017-07-12 Thread Jiannan Ouyang
This patch series augmented the existing GTP module to support flow
based GTP tunneling and modified the openvswitch datapath to support the
GTP vport type.

A flow based GTP net device enables that,
1) on the RX path, the outer (IP/UDP/GTP) header information could to be
stored in the metadata_dst struct, and embedded into the skb.
2) on the TX path, packets are encapsulated following instructions in
the metadata_dst field of the skb.

A flow based GTP net device can be integrated with Open vSwitch, which
allows SDN controllers to program GTP tunnels via Open vSwitch. 

Open vSwitch changes are based on patch set
[PATCH] Add GTP vport based on upstream datapath

Example usage with OVS:

ovs-vsctl add-port br0 gtp-vport -- set interface gtp-vport \
ofport_request=2 type=gtp option:remote_ip=flow options:key=flow

ovs-ofctl add-flow br0
"in_port=2,tun_src=192.168.60.141,tun_id=123, \
actions=set_field:02:00:00:00:00:00->eth_src, \
set_field:ff:ff:ff:ff:ff:ff->eth_dst,LOCAL"

ovs-ofctl add-flow br0 \
"in_port=LOCAL,actions=set_tunnel:888, \
set_field:192.168.60.141->tun_dst,2"

arp -s 10.1.1.122 02:00:00:00:00:00

Jiannan Ouyang (3):
  gtp: refactor to support flow-based gtp encap and decap
  gtp: Support creating flow-based gtp net_device
  openvswitch: Add GPRS Tunnel Protocol (GTP) vport support

 drivers/net/gtp.c| 375 ---
 include/net/gtp.h|   8 +
 include/uapi/linux/openvswitch.h |   1 +
 net/openvswitch/Kconfig  |  10 ++
 net/openvswitch/Makefile |   1 +
 net/openvswitch/vport-gtp.c  | 144 +++
 6 files changed, 475 insertions(+), 64 deletions(-)
 create mode 100644 net/openvswitch/vport-gtp.c

-- 
2.9.3



[GIT] Networking

2017-07-12 Thread David Miller

Nothing super serious in here except perhaps the brcmfmac fix.

1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari and
   Arnd Bergmann.

2) Fix race in statistics gathering in bnxt_en driver, from Michael Chan.

3) Can't use a mutex in RCU reader protected section on tap driver,
   from Cong WANG.

4) Fix mdb leak in bridging code, from Eduardo Valentin.

5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.

6) Buffer overflow in brcmfmac driver, from Arend van SPriel.

7) ioremap_nocache() return value needs to be checked in smsc911x
   driver, from Alexey Khoroshilov.

Please pull, thanks a lot.

The following changes since commit f263fbb8d60824993c1b64385056a3cfdbb21d45:

  Merge tag 'pci-v4.13-changes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci (2017-07-08 15:51:57 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to d93b07f8a689cde962d4f97668a74ab76f55734d:

  net: stmmac: revert "support future possible different internal phy mode" 
(2017-07-12 14:41:56 -0700)


Ahmad Fatoum (1):
  net: Fix minor code bug in timestamping.txt

Alexey Khoroshilov (1):
  smsc911x: Add check for ioremap_nocache() return code

Arend van Spriel (1):
  brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

Arnd Bergmann (1):
  net/mlx5: IPSec, fix 64-bit division correctly

Bert Kenward (1):
  sfc: don't read beyond unicast address list

Christophe Jaillet (4):
  cisco: enic: Fic an error handling path in 'vnic_dev_init_devcmd2()'
  net: stmmac: Fix error handling path in 'alloc_dma_rx_desc_resources()'
  net: stmmac: Fix error handling path in 'alloc_dma_tx_desc_resources()'
  net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer

Dan Carpenter (2):
  nfp: freeing the wrong variable
  net: ipmr: ipmr_get_table() returns NULL

David S. Miller (5):
  Merge tag 'mlx5-fixes-2017-07-09' of 
https://git.kernel.org/.../saeed/linux
  Merge branch 'bnxt_en-Bug-fixes'
  Merge branch 'stmmac-dma-resources-fixes'
  Merge branch 'mlxsw-spectrum-Various-fixes'
  Merge branch 'net-doc-fixes'

Eduardo Valentin (1):
  bridge: mdb: fix leak on complete_info ptr on fail path

Guilherme G. Piccoli (1):
  cxgb4: fix BUG() on interrupt deallocating path of ULD

Huy Nguyen (1):
  net/mlx5e: Initialize CEE's getpermhwaddr address buffer to 0xff

Ido Schimmel (4):
  mlxsw: spectrum_router: Add missing rollback
  mlxsw: spectrum_router: Fix use-after-free in route replace
  mlxsw: spectrum_switchdev: Remove unused variable
  mlxsw: spectrum_switchdev: Check status of memory allocation

Ilan Tayari (6):
  net/mlx5: Add missing include in lib/gid.c
  net/mlx5: IPSec, Fix 64-bit division on 32-bit builds
  net/mlx5: FPGA, make mlx5_fpga_device_brb static
  net/mlx5: FPGA, Fix datatype mismatch
  net/mlx5: Build wq.o even if MLX5_CORE_EN is not selected
  net/mlx5: Add Makefiles for subdirectories

Kalderon, Michal (1):
  qed: Fix printk option passed when printing ipv6 addresses

LABBE Corentin (1):
  net: stmmac: revert "support future possible different internal phy mode"

Lin Yun Sheng (1):
  net: hns: Bugfix for Tx timeout handling in hns driver

Michael Chan (3):
  bnxt_en: Fix race conditions in .ndo_get_stats64().
  bnxt_en: Fix bug in ethtool -L.
  bnxt_en: Fix SRIOV on big-endian architecture.

WANG Cong (1):
  tap: convert a mutex to a spinlock

Yonghong Song (1):
  samples/bpf: fix a build issue

stephen hemminger (2):
  socket: add documentation for missing elements
  datagram: fix kernel-doc comments

 Documentation/networking/timestamping.txt |  6 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 
+-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 16 
+++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c| 42 
+-
 drivers/net/ethernet/cisco/enic/vnic_dev.c|  9 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 16 
+---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/accel/Makefile|  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/Makefile |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c|  2 ++
 

[Patch net] netpoll: shut up a kernel warning on refcount

2017-07-12 Thread Cong Wang
When we convert atomic_t to refcount_t, a new kernel warning
on "increment on 0" is introduced in the netpoll code,
zap_completion_queue(). In fact for this special case, we know
the refcount is 0 and we just have to set it to 1 to satisfy
the following dev_kfree_skb_any(), so we can just use
refcount_set(..., 1) instead.

Fixes: 633547973ffc ("net: convert sk_buff.users from atomic_t to refcount_t")
Reported-by: Dave Jones 
Cc: Reshetova, Elena 
Signed-off-by: Cong Wang 
---
 net/core/netpoll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index d3408a6..8357f16 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -277,7 +277,7 @@ static void zap_completion_queue(void)
struct sk_buff *skb = clist;
clist = clist->next;
if (!skb_irq_freeable(skb)) {
-   refcount_inc(>users);
+   refcount_set(>users, 1);
dev_kfree_skb_any(skb); /* put this one back */
} else {
__kfree_skb(skb);
-- 
2.5.5



Re: netconsole refcount warning

2017-07-12 Thread Cong Wang
On Wed, Jul 12, 2017 at 3:30 PM, Cong Wang  wrote:
> On Sun, Jul 9, 2017 at 4:57 PM, Dave Jones  wrote:
>> The new refcount debugging code spews this twice during boot on my router..
>>
>>
>> refcount_t: increment on 0; use-after-free.
>> [ cut here ]
>> WARNING: CPU: 1 PID: 17 at lib/refcount.c:152 refcount_inc+0x2b/0x30
>> CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.12.0-firewall+ #8
>> task: 8801d4441ac0 task.stack: 8801d445
>> RIP: 0010:refcount_inc+0x2b/0x30
>> RSP: 0018:8801d4456da8 EFLAGS: 00010046
>> RAX: 002c RBX: 8801d4c3cf40 RCX: 
>> RDX: 002c RSI: 0003 RDI: ed003a88adab
>> RBP: 8801d4456da8 R08: 0003 R09: fbfff4afcb57
>> R10:  R11: fbfff4afcb58 R12: 8801d4c3c540
>> R13: 0082 R14: 8801ce9c7ff8 R15: 8801ce9c8aa0
>> FS:  () GS:8801d6a0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 7fa2b803156e CR3: 0001c405d000 CR4: 000406e0
>> Call Trace:
>>  zap_completion_queue+0xad/0x1a0
>
>
> Sigh... it is on purpose:
>
> commit 8a455b087c9629b3ae3b521b4f1ed16672f978cc
> Author: Jarek Poplawski 
> Date:   Thu Mar 20 16:07:27 2008 -0700
>
> netpoll: zap_completion_queue: adjust skb->users counter
>
> zap_completion_queue() retrieves skbs from completion_queue where they 
> have
> zero skb->users counter.  Before dev_kfree_skb_any() it should be non-zero
> yet, so it's increased now.
>
> Reported-and-tested-by: Andrew Morton 
> Signed-off-by: Jarek Poplawski 
> Signed-off-by: Andrew Morton 
> Signed-off-by: David S. Miller 
>
> We need to review it now. :-/

I think we should explicitly set it to 1 with refcount_set() since
we know it was 0 for sure.

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index d3408a6..8357f16 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -277,7 +277,7 @@ static void zap_completion_queue(void)
struct sk_buff *skb = clist;
clist = clist->next;
if (!skb_irq_freeable(skb)) {
-   refcount_inc(>users);
+   refcount_set(>users, 1);
dev_kfree_skb_any(skb); /* put this one back */
} else {
__kfree_skb(skb);


Re: netconsole refcount warning

2017-07-12 Thread Cong Wang
On Sun, Jul 9, 2017 at 4:57 PM, Dave Jones  wrote:
> The new refcount debugging code spews this twice during boot on my router..
>
>
> refcount_t: increment on 0; use-after-free.
> [ cut here ]
> WARNING: CPU: 1 PID: 17 at lib/refcount.c:152 refcount_inc+0x2b/0x30
> CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.12.0-firewall+ #8
> task: 8801d4441ac0 task.stack: 8801d445
> RIP: 0010:refcount_inc+0x2b/0x30
> RSP: 0018:8801d4456da8 EFLAGS: 00010046
> RAX: 002c RBX: 8801d4c3cf40 RCX: 
> RDX: 002c RSI: 0003 RDI: ed003a88adab
> RBP: 8801d4456da8 R08: 0003 R09: fbfff4afcb57
> R10:  R11: fbfff4afcb58 R12: 8801d4c3c540
> R13: 0082 R14: 8801ce9c7ff8 R15: 8801ce9c8aa0
> FS:  () GS:8801d6a0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fa2b803156e CR3: 0001c405d000 CR4: 000406e0
> Call Trace:
>  zap_completion_queue+0xad/0x1a0


Sigh... it is on purpose:

commit 8a455b087c9629b3ae3b521b4f1ed16672f978cc
Author: Jarek Poplawski 
Date:   Thu Mar 20 16:07:27 2008 -0700

netpoll: zap_completion_queue: adjust skb->users counter

zap_completion_queue() retrieves skbs from completion_queue where they have
zero skb->users counter.  Before dev_kfree_skb_any() it should be non-zero
yet, so it's increased now.

Reported-and-tested-by: Andrew Morton 
Signed-off-by: Jarek Poplawski 
Signed-off-by: Andrew Morton 
Signed-off-by: David S. Miller 

We need to review it now. :-/


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-07-12 Thread Florian Westphal
Richard Weinberger  wrote:
> Am 01.07.2017 um 12:35 schrieb Florian Westphal:
> > The compare on removal is not needed afaics, and its also not used when
> > doing lookup to begin with, so we can just recompute it?
> 
> Isn't this a way too much overhead?

I don't think so.  This computation only occurs when we dump events
to userspace.

> I personally favor Pablo's per-cpu counter approach.
> That way the IDs are unique again and we get rid of the info leak without
> much effort.

I have not seen these patches so can't really comment.


Re: [iovisor-dev] [PATCH v3 net-next 02/12] bpf/verifier: rework value tracking

2017-07-12 Thread Nadav Amit
Edward Cree  wrote:

> On 07/07/17 18:45, Nadav Amit wrote:
>> For me changes such as:
>> 
>>> if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
>>> -   dst_reg->min_value -= min_val;
>>> +   dst_reg->min_value -= max_val;
>> 
>> are purely cryptic. What happened here? Was there a bug before and if so
>> what are its implications? Why can’t it be in a separate patch?
> In this specific case, there was a bug before: if (say) src and dst were
> both unknown bytes (so range 0 to 255), it would compute the new min and max
> to be 0, so it would think the result is known to be 0.  But that's wrong,
> because it could be anything from -255 to +255.  The bug's implications are
> that it could be used to construct an out-of-range offset to (say) a map
> pointer which the verifier would think was in-range and thus accept.

This sounds like a serious bug that may need to be backported to stable
versions, no? In this case I would assume it should be in a separate patch
so it could be applied separately.

> It might be possible to put it in a separate patch, but in general (not
> necessarily the case here) isolated fixes to range handling break some of
> the existing regression tests.  That is why I ended up doing patch #4,
> because I couldn't find a small fix for the patch #1 test without breaking
> others.  Essentially, this patch started out as just the tnum tracking to
> replace imm and align, and then rolled in everything I had to do to get the
> regression tests to pass again.  So some of those things are fixes that
> could go in earlier patches, but because of the order I wrote it in I'd have
> to disentangle them.  I can do that if it's necessary, but I'm not sure it'd
> really make the patch that much more readable so I'd rather avoid that work
> if I can get away with it…

I clearly understand and can relate to it. Still, your patch really stands
out:

git log --stat kernel/bpf/ | grep -G '^ kernel/bpf' | tr -s " " \
>  | sort -g -r  -k 3 | head -n 10

 kernel/bpf/verifier.c | 1075 -
 kernel/bpf/bpf_lru_list.c | 567 ++
 kernel/bpf/core.c | 536 
 kernel/bpf/lpm_trie.c | 503 ++
 kernel/bpf/verifier.c | 441 +-
 kernel/bpf/inode.c | 387 +++
 kernel/bpf/hashtab.c | 362 +++
 kernel/bpf/verifier.c | 329 +++---
 kernel/bpf/hashtab.c | 275 ++-
 kernel/bpf/verifier.c | 266 +++---

>> I also think that changes such as:
>>> -   s64 min_value;
>>> -   u64 max_value;
>> [snip]
>>> +   s64 min_value; /* minimum possible (s64)value */
>>> +   u64 max_value; /* maximum possible (u64)value */
>> Should have been avoided. Personally, I find this comment redundant (to say
>> the least). It does not help to reduce the diff size.
> The comment is meaningful, though perhaps badly phrased.  It's an attempt to
> define the semantics of these fields (which previously was unclear); e.g. the
> first one means "minimum value when interpreted as signed", i.e. the (s64) in
> the comment is a cast.
> Apparently those weren't the semantics the original author intended, but I'm
> not sure the original semantics were well-defined and I certainly don't
> understand them well enough to define them, hence why I defined my own here
> (and then redefined them in patch #4).

Makes more sense now.

>> In this regard, I think that refactoring should have been done first and not
>> together with the logic changes. As another example, changing UNKNOWN_VALUE
>> to SCALAR_VALUE should have been a separate, easy to understand patch.
> But SCALAR_VALUE is the union UNKNOWN_VALUE *or* CONST_IMM, and merging those
> together means all of the ripping-out of evaluate_reg_alu() and friends and
> thus depends on much of the rest of the patch.
>>> The latter is also needed for correctness in computing reg->range;
>>> if 'pkt_ptr + offset' could conceivably overflow, then the result
>>> could be < pkt_end without being a valid pointer into the packet.
>>> We thus rely on the assumption that the packet pointer will never be
>>> within MAX_PACKET_OFF of the top of the address space.  (This
>>> assumption is, again, carried over from the existing verifier.)
>> I understand the limitations (I think). I agree that CONST being spillable
>> is not directly related. As for the possible packet offsets/range:
>> intentionally or not you do make some changes that push the 64k packet size
>> limit even deeper into the code. While the packet size should be limited to
>> avoid overflow, IIUC the requirement is that:
>> 
>>  64 > log(n_insn) + 

Re: [PATCH] net: stmmac: revert "support future possible different internal phy mode"

2017-07-12 Thread David Miller
From: Corentin Labbe 
Date: Wed, 12 Jul 2017 09:32:34 +0200

> Since internal phy-mode is reserved for non-xMII protocol we cannot use
> it with dwmac-sun8i.
> Furthermore, all DT patchs which comes with this patch were cleaned, so
> the current state is broken.
> This reverts commit 1c2fa5f84683 ("net: stmmac: support future possible 
> different internal phy mode")
> 
> Fixes: 1c2fa5f84683 ("net: stmmac: support future possible different internal 
> phy mode")
> Signed-off-by: Corentin Labbe 

Applied, thanks.


Re: [PATCH net] sfc: don't read beyond unicast address list

2017-07-12 Thread David Miller
From: Bert Kenward 
Date: Wed, 12 Jul 2017 17:19:41 +0100

> If we have more than 32 unicast MAC addresses assigned to an interface
> we will read beyond the end of the address table in the driver when
> adding filters. The next 256 entries store multicast addresses, so we
> will end up attempting to insert duplicate filters, which is mostly
> harmless. If we add more than 288 unicast addresses we will then read
> past the multicast address table, which is likely to be more exciting.
> 
> Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in 
> efx_ef10_filter_sync_rx_mode")
> Signed-off-by: Bert Kenward 

Applied and queued up for -stable, thanks.


Re: [PATCH net 0/2] minor net kernel-doc fixes

2017-07-12 Thread David Miller
From: Stephen Hemminger 
Date: Wed, 12 Jul 2017 09:29:05 -0700

> Fix a couple of small errors in kernel-doc for networking

Series applied, thanks Stephen.


Re: [PATCH] igb: fix unused igb_deliver_wake_packet() warning when CONFIG_PM=n

2017-07-12 Thread David Miller
From: Dave Hansen 
Date: Wed, 12 Jul 2017 14:09:25 -0700

> 
> From: Dave Hansen 
> 
> I'm seeing warnings on kernel configurations where CONFIG_PM is
> disabled.  It happens in 4.12, at least:
> 
> drivers/ethernet/intel/igb/igb_main.c:7988:13: warning: 
> 'igb_deliver_wake_packet' defined but not used [-Wunused-function]
> 
> This is because igb_deliver_wake_packet() is defined outside of
> the #ifdef", but is used only a single time within the #ifdef in
> igb_resume().  Fix it by moving igb_deliver_wake_packet() next to
> igb_resume() inside the #ifdef.
> 
> The diff ends up looking a bit funky here.  It *looks* like
> igb_suspend() is getting moved, but that's an artifact of how
> 'diff' sees the changes.
> 
> Cc: Jeff Kirsher 
> Cc: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Dave Hansen 

I'll let Jeff queue this up into his tree.


Re: [PATCH] smsc911x: Add check for ioremap_nocache() return code

2017-07-12 Thread David Miller
From: Alexey Khoroshilov 
Date: Wed, 12 Jul 2017 23:58:56 +0300

> There is no check for return code of smsc911x_drv_probe()
> in smsc911x_drv_probe(). The patch adds one.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Applied, thanks.


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-07-12 Thread Richard Weinberger
Florian,

Am 01.07.2017 um 12:35 schrieb Florian Westphal:
>>> Perhaps we can place that in a new extension (its not needed in any
>>> fastpath ops)?
>>
>> To get rid of the infoleak we have to re-introduce the id field in struct 
>> nf_conn
>> and struct nf_conntrack_expect.
> 
> Why will this not work?

You are right, when we compute the ID from the whole object, it should be fine.

>> Otherwise have nothing to compare against in the conntrack/expect remove 
>> case.
> 
> Not following, sorry.  The id is not used anywhere except when we send
> info to userspace.
> 
> The compare on removal is not needed afaics, and its also not used when
> doing lookup to begin with, so we can just recompute it?

Isn't this a way too much overhead?

I personally favor Pablo's per-cpu counter approach.
That way the IDs are unique again and we get rid of the info leak without
much effort.

Thanks,
//richard


Re: [PATCH] igb: fix unused igb_deliver_wake_packet() warning when CONFIG_PM=n

2017-07-12 Thread Fabio Estevam
On Wed, Jul 12, 2017 at 6:09 PM, Dave Hansen
 wrote:
>
> From: Dave Hansen 
>
> I'm seeing warnings on kernel configurations where CONFIG_PM is
> disabled.  It happens in 4.12, at least:
>
> drivers/ethernet/intel/igb/igb_main.c:7988:13: warning: 
> 'igb_deliver_wake_packet' defined but not used [-Wunused-function]
>
> This is because igb_deliver_wake_packet() is defined outside of
> the #ifdef", but is used only a single time within the #ifdef in
> igb_resume().  Fix it by moving igb_deliver_wake_packet() next to
> igb_resume() inside the #ifdef.
>
> The diff ends up looking a bit funky here.  It *looks* like
> igb_suspend() is getting moved, but that's an artifact of how
> 'diff' sees the changes.
>
> Cc: Jeff Kirsher 
> Cc: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Dave Hansen 

Seems to be fixed by the following commit in linux-next:

commit 000ba1f2ebf0d6f93b9ae6cfbe5417e66f1b8e8c
Author: Arnd Bergmann 
Date:   Thu Apr 27 21:09:52 2017 +0200

igb: mark PM functions as __maybe_unused

The new wake function is only used by the suspend/resume handlers that
are defined in inside of an #ifdef, which can cause this harmless
warning:

drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning:
'igb_deliver_wake_packet' defined but not used [-Wunused-function]

Removing the #ifdef, instead using a __maybe_unused annotation
simplifies the code and avoids the warning.

Fixes: b90fa8763560 ("igb: Enable reading of wake up packet")
Signed-off-by: Arnd Bergmann 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 


[PATCH] igb: fix unused igb_deliver_wake_packet() warning when CONFIG_PM=n

2017-07-12 Thread Dave Hansen

From: Dave Hansen 

I'm seeing warnings on kernel configurations where CONFIG_PM is
disabled.  It happens in 4.12, at least:

drivers/ethernet/intel/igb/igb_main.c:7988:13: warning: 
'igb_deliver_wake_packet' defined but not used [-Wunused-function]

This is because igb_deliver_wake_packet() is defined outside of
the #ifdef", but is used only a single time within the #ifdef in
igb_resume().  Fix it by moving igb_deliver_wake_packet() next to
igb_resume() inside the #ifdef.

The diff ends up looking a bit funky here.  It *looks* like
igb_suspend() is getting moved, but that's an artifact of how
'diff' sees the changes.

Cc: Jeff Kirsher 
Cc: intel-wired-...@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Dave Hansen 
---

 b/drivers/net/ethernet/intel/igb/igb_main.c |   46 ++--
 1 file changed, 23 insertions(+), 23 deletions(-)

diff -puN 
drivers/net/ethernet/intel/igb/igb_main.c~undef-igb_deliver_wake_packet 
drivers/net/ethernet/intel/igb/igb_main.c
--- a/drivers/net/ethernet/intel/igb/igb_main.c~undef-igb_deliver_wake_packet   
2017-07-12 14:08:37.205721093 -0700
+++ b/drivers/net/ethernet/intel/igb/igb_main.c 2017-07-12 14:08:37.210721093 
-0700
@@ -7985,6 +7985,29 @@ static int __igb_shutdown(struct pci_dev
return 0;
 }
 
+#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
+static int igb_suspend(struct device *dev)
+{
+   int retval;
+   bool wake;
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   retval = __igb_shutdown(pdev, , 0);
+   if (retval)
+   return retval;
+
+   if (wake) {
+   pci_prepare_to_sleep(pdev);
+   } else {
+   pci_wake_from_d3(pdev, false);
+   pci_set_power_state(pdev, PCI_D3hot);
+   }
+
+   return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
 static void igb_deliver_wake_packet(struct net_device *netdev)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -8015,29 +8038,6 @@ static void igb_deliver_wake_packet(stru
netif_rx(skb);
 }
 
-#ifdef CONFIG_PM
-#ifdef CONFIG_PM_SLEEP
-static int igb_suspend(struct device *dev)
-{
-   int retval;
-   bool wake;
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   retval = __igb_shutdown(pdev, , 0);
-   if (retval)
-   return retval;
-
-   if (wake) {
-   pci_prepare_to_sleep(pdev);
-   } else {
-   pci_wake_from_d3(pdev, false);
-   pci_set_power_state(pdev, PCI_D3hot);
-   }
-
-   return 0;
-}
-#endif /* CONFIG_PM_SLEEP */
-
 static int igb_resume(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
_


Re: [PATCH 1/1] bridge: mdb: fix leak on complete_info ptr on fail path

2017-07-12 Thread Eduardo Valentin
On Tue, Jul 11, 2017 at 08:02:33PM -0700, David Miller wrote:
> From: Eduardo Valentin 
> Date: Tue, 11 Jul 2017 14:55:12 -0700
> 
> > We currently get the following kmemleak report:
> > unreferenced object 0x8800039d9820 (size 32):
> >   comm "softirq", pid 0, jiffies 4295212383 (age 792.416s)
> >   hex dump (first 32 bytes):
> > 00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00  
> > 00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff  
> >   backtrace:
> > [] kmemleak_alloc+0x4a/0xa0
> > [] kmem_cache_alloc_trace+0xb8/0x1c0
> > [] __br_mdb_notify+0x2a3/0x300 [bridge]
> > [] br_mdb_notify+0x6e/0x70 [bridge]
> > [] br_multicast_add_group+0x109/0x150 [bridge]
> > [] br_ip6_multicast_add_group+0x58/0x60 [bridge]
> > [] br_multicast_rcv+0x1d5/0xdb0 [bridge]
> > [] br_handle_frame_finish+0xcf/0x510 [bridge]
> > [] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter]
> > [] br_nf_hook_thresh+0x48/0xb0 [br_netfilter]
> > [] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 
> > [br_netfilter]
> > [] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter]
> > [] br_nf_pre_routing+0x197/0x3d0 [br_netfilter]
> > [] nf_iterate+0x52/0x60
> > [] nf_hook_slow+0x5c/0xb0
> > [] br_handle_frame+0x1a4/0x2c0 [bridge]
> > 
> > This happens when switchdev_port_obj_add() fails. This patch
> > frees complete_info object in the fail path.
> 
> Applied, thanks.
> 

Thanks!

> I'm so glad I pushed back on your original patch :-)

man, me too !! :-)

> 
> > Cc: stable  # v4.9+
> 
> Please do not add stable tags to networking patches, I queue up and
> submit networking -stable changes myself upon request which I am doing
> in this case as well.

Oh, I see. I won't copy stable next time and I will request you to queue, when 
applicable.

> 
> Thanks.

-- 
All the best,
Eduardo Valentin


RFC7510 MPLS-over-UDP support in iproute2

2017-07-12 Thread Blake Willis
Greetings,

I noticed that MPLS-over-IP & MPLS-over-UDP support have recently been 
committed 
to the kernel, & that MPLS-over-IP support was commited to iproute2 last week.

Please consider this a humble request for RFC7510 MPLS-over-UDP (UDP port 6635) 
support in iproute2.

Perhaps I'm missing something & this should be doable with the current code?  
The previous iproute2 example on netdev@ ("ip fou add port 6635 ipproto 137") 
seems to indicate that it would be creating an "mpls-over-ip-over-udp" tunnel, 
which wouldn't be compatible with RFC7510, which just adds a simple UDP header 
to MPLS packets (MPLS dataplane packets aren't IP packets & thus don't have an 
IP protocol number, e.g. it uses ethertype 0x8847 for unicast & 0x8848 for 
multicast).

Or perhaps RFC7510 support would be better off as part of the lwtunnel 
framework?

Thanks & best regards,
---
 Blake Willis
 Network Engineering Consultant
 Scalable System Design LLC
 blake at 2112 dot net

"I think what a lot of people don't appreciate is that technology does not 
automatically improve. It only improves if a lot of really strong engineering 
talent is applied to the problem, that it improves. And there are many examples 
in history where civilizations have reached a certain technology level, and 
then have fallen well below that, and then recovered only millennia later."

  -- Elon Musk


[PATCH] smsc911x: Add check for ioremap_nocache() return code

2017-07-12 Thread Alexey Khoroshilov
There is no check for return code of smsc911x_drv_probe()
in smsc911x_drv_probe(). The patch adds one.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/ethernet/smsc/smsc911x.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index ea1bbc355b4d..0b6a39b003a4 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2467,6 +2467,10 @@ static int smsc911x_drv_probe(struct platform_device 
*pdev)
pdata = netdev_priv(dev);
dev->irq = irq;
pdata->ioaddr = ioremap_nocache(res->start, res_size);
+   if (!pdata->ioaddr) {
+   retval = -ENOMEM;
+   goto out_ioremap_fail;
+   }
 
pdata->dev = dev;
pdata->msg_enable = ((1 << debug) - 1);
@@ -2572,6 +2576,7 @@ static int smsc911x_drv_probe(struct platform_device 
*pdev)
smsc911x_free_resources(pdev);
 out_request_resources_fail:
iounmap(pdata->ioaddr);
+out_ioremap_fail:
free_netdev(dev);
 out_release_io_1:
release_mem_region(res->start, resource_size(res));
-- 
2.7.4



Re: [PATCH v2 03/22] net: broadcom: stop using rtc deprecated functions

2017-07-12 Thread Michael Chan
On Wed, Jul 12, 2017 at 1:04 AM, Benjamin Gaignard
 wrote:
> rtc_time_to_tm() and rtc_tm_to_time() are deprecated because they
> rely on 32bits variables and that will make rtc break in y2038/2016.
> Stop using those two functions to safer 64bits ones.
>
> Signed-off-by: Benjamin Gaignard 
> CC: Michael Chan 
> CC: netdev@vger.kernel.org
> CC: linux-ker...@vger.kernel.org

Acked-by: Michael Chan 

Thanks.


Re: [iovisor-dev] [PATCH v3 net-next 02/12] bpf/verifier: rework value tracking

2017-07-12 Thread Edward Cree
On 07/07/17 18:45, Nadav Amit wrote:
> For me changes such as:
>
>>  if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
>> -dst_reg->min_value -= min_val;
>> +dst_reg->min_value -= max_val;
>
> are purely cryptic. What happened here? Was there a bug before and if so
> what are its implications? Why can’t it be in a separate patch?
In this specific case, there was a bug before: if (say) src and dst were
 both unknown bytes (so range 0 to 255), it would compute the new min and max
 to be 0, so it would think the result is known to be 0.  But that's wrong,
 because it could be anything from -255 to +255.  The bug's implications are
 that it could be used to construct an out-of-range offset to (say) a map
 pointer which the verifier would think was in-range and thus accept.
It might be possible to put it in a separate patch, but in general (not
 necessarily the case here) isolated fixes to range handling break some of
 the existing regression tests.  That is why I ended up doing patch #4,
 because I couldn't find a small fix for the patch #1 test without breaking
 others.  Essentially, this patch started out as just the tnum tracking to
 replace imm and align, and then rolled in everything I had to do to get the
 regression tests to pass again.  So some of those things are fixes that
 could go in earlier patches, but because of the order I wrote it in I'd have
 to disentangle them.  I can do that if it's necessary, but I'm not sure it'd
 really make the patch that much more readable so I'd rather avoid that work
 if I can get away with it...
> I also think that changes such as:
>> -s64 min_value;
>> -u64 max_value;
> [snip]
>> +s64 min_value; /* minimum possible (s64)value */
>> +u64 max_value; /* maximum possible (u64)value */
> Should have been avoided. Personally, I find this comment redundant (to say
> the least). It does not help to reduce the diff size.
The comment is meaningful, though perhaps badly phrased.  It's an attempt to
 define the semantics of these fields (which previously was unclear); e.g. the
 first one means "minimum value when interpreted as signed", i.e. the (s64) in
 the comment is a cast.
Apparently those weren't the semantics the original author intended, but I'm
 not sure the original semantics were well-defined and I certainly don't
 understand them well enough to define them, hence why I defined my own here
 (and then redefined them in patch #4).
> In this regard, I think that refactoring should have been done first and not
> together with the logic changes. As another example, changing UNKNOWN_VALUE
> to SCALAR_VALUE should have been a separate, easy to understand patch.
But SCALAR_VALUE is the union UNKNOWN_VALUE *or* CONST_IMM, and merging those
 together means all of the ripping-out of evaluate_reg_alu() and friends and
 thus depends on much of the rest of the patch.
>> The latter is also needed for correctness in computing reg->range;
>> if 'pkt_ptr + offset' could conceivably overflow, then the result
>> could be < pkt_end without being a valid pointer into the packet.
>> We thus rely on the assumption that the packet pointer will never be
>> within MAX_PACKET_OFF of the top of the address space.  (This
>> assumption is, again, carried over from the existing verifier.)
> I understand the limitations (I think). I agree that CONST being spillable
> is not directly related. As for the possible packet offsets/range:
> intentionally or not you do make some changes that push the 64k packet size
> limit even deeper into the code. While the packet size should be limited to
> avoid overflow, IIUC the requirement is that:
>
>   64 > log(n_insn) + log(MAX_PACKET_OFF) + 1
I don't think that's right, unless you also make each addition to a packet-
 pointer require a max_value <= MAX_PACKET_OFF.  It's also a very loose bound
 because it assumes every instruction is such an add.
I think it makes far more sense to do it the way I have done, where the bounds
 are tracked all the way through the arithmetic and then only checked against
 MAX_PACKET_OFF when doing the access (and when doing a test against a
 PTR_TO_PACKET_END, i.e. find_good_pkt_pointers(), though for some reason I
 only added that check in patch #4).
That way we can allow things like (for the sake of example) adding $BIG_NUMBER
 to a packet pointer and then subtracting it again.
> Such an assertion may be staticly checked (using BUILD_BUG_ON), but I don’t
> think should propagate into the entire code, especially in a non consistent
> way. For example:
>
>> struct bpf_reg_state {
>>  enum bpf_reg_type type;
>>  union {
>> -/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE 
>> */
>> -s64 imm;
>> -
>> -/* valid when type == PTR_TO_PACKET* */
>> -struct {
>> -u16 off;
>> -u16 range;
>> -};
>> +/* valid when type == PTR_TO_PACKET 

Re: [PATCH v1 net-next 4/5] drop_monitor: let drop stat support net ns

2017-07-12 Thread kbuild test robot
Hi martin,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/martinbj2008-gmail-com/drop_monitor-import-netnamespace-framework/20170712-205015


coccinelle warnings: (new ones prefixed by >>)

>> net/core/drop_monitor.c:555:36-37: Unneeded semicolon

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


[PATCH] drop_monitor: fix semicolon.cocci warnings

2017-07-12 Thread kbuild test robot
net/core/drop_monitor.c:555:36-37: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: d5bf05101a5c ("drop_monitor: let drop stat support net ns")
CC: martin Zhang 
Signed-off-by: Fengguang Wu 
---

 drop_monitor.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -552,7 +552,7 @@ static void __net_exit dm_net_exit(struc
data = (struct ns_pcpu_dm_data *)per_cpu_ptr(pcpu_data, cpu);
if (data->skb)
kfree_skb(data->skb);
-   del_timer_sync(>send_timer);;
+   del_timer_sync(>send_timer);
}
 }
 


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-07-12 Thread Dave Watson
On 07/12/17 09:20 AM, Steffen Klassert wrote:
> On Tue, Jul 11, 2017 at 11:53:11AM -0700, Dave Watson wrote:
> > On 07/11/17 08:29 AM, Steffen Klassert wrote:
> > > Sorry for replying to old mail...
> > > > +int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
> > > > +{
> > > 
> > > ...
> > > 
> > > > +
> > > > +   if (!sw_ctx->aead_send) {
> > > > +   sw_ctx->aead_send = crypto_alloc_aead("gcm(aes)", 0, 0);
> > > > +   if (IS_ERR(sw_ctx->aead_send)) {
> > > > +   rc = PTR_ERR(sw_ctx->aead_send);
> > > > +   sw_ctx->aead_send = NULL;
> > > > +   goto free_rec_seq;
> > > > +   }
> > > > +   }
> > > > +
> > > 
> > > When I look on how you allocate the aead transformation, it seems
> > > that you should either register an asynchronous callback with
> > > aead_request_set_callback(), or request for a synchronous algorithm.
> > > 
> > > Otherwise you will crash on an asynchronous crypto return, no?
> > 
> > The intention is for it to be synchronous, and gather directly from
> > userspace buffers.  It looks like calling
> > crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC) is the correct way
> > to request synchronous algorithms only?
> 
> Yes, but then you loose the aes-ni based algorithms because they are
> asynchronous. If you want to have good crypto performance, it is
> better to implement the asynchronous callbacks.

Right, the trick is we want both aesni, and to guarantee that we are
done using the input buffers before sendmsg() returns.  For now I can
set a callback, and wait on a completion.  The initial use case of
userspace openssl integration shouldn't hit the aesni async case
anyway (!irq_fpu_usable())


Re: [iproute PATCH] ip netns: Make sure netns name is sane

2017-07-12 Thread Phil Sutter
On Wed, Jul 12, 2017 at 09:38:34AM -0700, Stephen Hemminger wrote:
> On Mon, 10 Jul 2017 13:19:12 +0200
> Phil Sutter  wrote:
> 
> > +static bool is_basename(const char *name)
> > +{
> > +   char *name_dup = strdup(name);
> > +   bool rc = true;
> > +
> > +   if (!name_dup)
> > +   return false;
> > +
> > +   if (strcmp(basename(name_dup), name))
> > +   rc = false;
> > +
> > +   free(name_dup);
> > +   return rc;
> > +}
> 
> Looks like natural place to use strdupa.
> 
> static bool is_basename(const char *name)
> {
>   return strcmp(basename(strdupa(name), name) == 0;
> }

Good point! Anyway, my patch fails to cover 'ip netns del' command
(apart from the '..' issue), so I'd suggest to instead apply Matteo's
version (Message-ID 20170710120831.9355-1-mcr...@redhat.com).

Thanks, Phil


Re: [PATCH v1 net-next 1/5] drop_monitor: import netnamespace framework

2017-07-12 Thread Cong Wang
On Wed, Jul 12, 2017 at 10:08 AM, 张军伟(基础平台部)
 wrote:
> about skb->sk
> it is used as supplementary when skb->dev is empty,such as netlink message。
>
> +   if (skb->dev)
> +   net = dev_net(skb->dev);
> +   else if (skb->sk)
> +   net = sock_net(skb->sk);
> +   else
> +   return;

Check udp_set_dev_scratch().

Again, as Neil mentioned, the idea is arguable, it is actually harder to trace
skb's with your patch when they across netns'es.


Re: [PATCH v1 net-next 1/5] drop_monitor: import netnamespace framework

2017-07-12 Thread Cong Wang
On Wed, Jul 12, 2017 at 6:37 AM, Neil Horman  wrote:
> On Wed, Jul 12, 2017 at 06:40:49PM +0800, martinbj2...@gmail.com wrote:
>> The dropwatch is a very useful tool to diagnose network problem,
>> which give us greate help.
>> Dropwatch could not work under container(net namespace).
>> It is a pitty, so let it support net ns.
>>
> Sorry, Im having a hard time wrapping my head around this.  Why exactly is it
> that dropwatch won't work in a namespaced environment?  IIRC, the kfree
> tracepoints are namespace agnostic, and so running dropwatch anywhere should
> result in seeing drops in all namespaces.  I grant that perhaps it would be 
> nice
> to filter on a namespace, but it should all 'just work' for some definition of
> the term, no?

Agreed.

And I doubt Martin's implementation which uses skb->sk to retrieve net
works for RX packets, since skb->sk is set very late (except with early demux)
on RX side but we can drop them at anytime...


Re: [iproute PATCH] ip netns: Make sure netns name is sane

2017-07-12 Thread Stephen Hemminger
On Mon, 10 Jul 2017 13:19:12 +0200
Phil Sutter  wrote:

> +static bool is_basename(const char *name)
> +{
> + char *name_dup = strdup(name);
> + bool rc = true;
> +
> + if (!name_dup)
> + return false;
> +
> + if (strcmp(basename(name_dup), name))
> + rc = false;
> +
> + free(name_dup);
> + return rc;
> +}

Looks like natural place to use strdupa.

static bool is_basename(const char *name)
{
return strcmp(basename(strdupa(name), name) == 0;
}


[RFC 3/3] net: macb: Use sram for rx buffers

2017-07-12 Thread Alexander Dahl
The default way for the driver is to use system memory for RX/TX DMA
buffers and rings. For the AT91SAM9G20 this is SDRAM which is connected
through the EBI bus, together with other memories like NAND-Flash or
external SRAM. If a memory access to external SRAM using the NWAIT
signal takes too long, the EMAC on the SoC throws receive overrun (ROVR)
errors which means it can not put incoming packets into SDRAM (through
DMA). Those errors add up in /proc/net/dev

To circumvent those "dropped" ethernet frames, we put the RX buffers and
rings into the small internal SRAM of the SoC, which are also usable for
DMA but directly connected through the AHB without the path through the
EBI. This way there are no lost ethernet frames anymore. (If there's too
much load however packets can still be dropped by the kernel.)

Signed-off-by: Alexander Dahl 
---
 drivers/net/ethernet/cadence/macb.c | 66 ++---
 drivers/net/ethernet/cadence/macb.h |  2 ++
 2 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 91f7492..8dacd9c 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,9 +41,9 @@
 #define MACB_RX_BUFFER_SIZE128
 #define RX_BUFFER_MULTIPLE 64  /* bytes */
 
-#define DEFAULT_RX_RING_SIZE   512 /* must be power of 2 */
+#define DEFAULT_RX_RING_SIZE   128 /* must be power of 2 */
 #define MIN_RX_RING_SIZE   64
-#define MAX_RX_RING_SIZE   8192
+#define MAX_RX_RING_SIZE   128
 #define RX_RING_BYTES(bp)  (macb_dma_desc_get_size(bp) \
 * (bp)->rx_ring_size)
 
@@ -1660,9 +1661,14 @@ static void gem_free_rx_buffers(struct macb *bp)
 static void macb_free_rx_buffers(struct macb *bp)
 {
if (bp->rx_buffers) {
-   dma_free_coherent(>pdev->dev,
- bp->rx_ring_size * bp->rx_buffer_size,
- bp->rx_buffers, bp->rx_buffers_dma);
+   if (bp->sram_pool)
+   gen_pool_free(bp->sram_pool,
+ (unsigned long)bp->rx_buffers,
+ bp->rx_ring_size * bp->rx_buffer_size);
+   else
+   dma_free_coherent(>pdev->dev,
+ bp->rx_ring_size * bp->rx_buffer_size,
+ bp->rx_buffers, bp->rx_buffers_dma);
bp->rx_buffers = NULL;
}
 }
@@ -1674,8 +1680,12 @@ static void macb_free_consistent(struct macb *bp)
 
bp->macbgem_ops.mog_free_rx_buffers(bp);
if (bp->rx_ring) {
-   dma_free_coherent(>pdev->dev, RX_RING_BYTES(bp),
- bp->rx_ring, bp->rx_ring_dma);
+   if (bp->sram_pool)
+   gen_pool_free(bp->sram_pool, (unsigned long)bp->rx_ring,
+ RX_RING_BYTES(bp));
+   else
+   dma_free_coherent(>pdev->dev, RX_RING_BYTES(bp),
+ bp->rx_ring, bp->rx_ring_dma);
bp->rx_ring = NULL;
}
 
@@ -1690,6 +1700,28 @@ static void macb_free_consistent(struct macb *bp)
}
 }
 
+static void macb_init_sram(struct macb *bp)
+{
+   struct device_node *node;
+   struct platform_device *pdev = NULL;
+
+   for_each_compatible_node(node, NULL, "mmio-sram") {
+   pdev = of_find_device_by_node(node);
+   if (pdev) {
+   of_node_put(node);
+   break;
+   }
+   }
+
+   if (!pdev) {
+   netdev_warn(bp->dev, "Failed to find sram device!\n");
+   bp->sram_pool = NULL;
+   return;
+   }
+
+   bp->sram_pool = gen_pool_get(>dev, NULL);
+}
+
 static int gem_alloc_rx_buffers(struct macb *bp)
 {
int size;
@@ -1710,14 +1742,20 @@ static int macb_alloc_rx_buffers(struct macb *bp)
int size;
 
size = bp->rx_ring_size * bp->rx_buffer_size;
-   bp->rx_buffers = dma_alloc_coherent(>pdev->dev, size,
-   >rx_buffers_dma, GFP_KERNEL);
+   if (bp->sram_pool)
+   bp->rx_buffers = gen_pool_dma_alloc(bp->sram_pool, size,
+   >rx_buffers_dma);
+   else
+   bp->rx_buffers = dma_alloc_coherent(>pdev->dev, size,
+   >rx_buffers_dma,
+   GFP_KERNEL);
if (!bp->rx_buffers)
return -ENOMEM;
 
netdev_dbg(bp->dev,
   "Allocated RX buffers of %d bytes at %08lx (mapped %p)\n",
   size, (unsigned long)bp->rx_buffers_dma, 

[RFC 1/3] net: macb: Add register descriptions

2017-07-12 Thread Alexander Dahl
Analog to the already present long register names for GEM, add those for
MACB. Taken from the AT91SAM9G20 complete datasheet.

Signed-off-by: Alexander Dahl 
---
 drivers/net/ethernet/cadence/macb.h | 66 ++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index ec037b0..8547d92 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -31,41 +31,41 @@
 #define MACB_IDR   0x002c /* Interrupt Disable */
 #define MACB_IMR   0x0030 /* Interrupt Mask */
 #define MACB_MAN   0x0034 /* PHY Maintenance */
-#define MACB_PTR   0x0038
-#define MACB_PFR   0x003c
-#define MACB_FTO   0x0040
-#define MACB_SCF   0x0044
-#define MACB_MCF   0x0048
-#define MACB_FRO   0x004c
-#define MACB_FCSE  0x0050
-#define MACB_ALE   0x0054
-#define MACB_DTF   0x0058
-#define MACB_LCOL  0x005c
-#define MACB_EXCOL 0x0060
-#define MACB_TUND  0x0064
-#define MACB_CSE   0x0068
-#define MACB_RRE   0x006c
-#define MACB_ROVR  0x0070
-#define MACB_RSE   0x0074
-#define MACB_ELE   0x0078
-#define MACB_RJA   0x007c
-#define MACB_USF   0x0080
-#define MACB_STE   0x0084
-#define MACB_RLE   0x0088
+#define MACB_PTR   0x0038 /* Pause Time */
+#define MACB_PFR   0x003c /* Pause Frames Received */
+#define MACB_FTO   0x0040 /* Frames Transmitted Ok */
+#define MACB_SCF   0x0044 /* Single Collision Frames */
+#define MACB_MCF   0x0048 /* Multiple Collision Frames */
+#define MACB_FRO   0x004c /* Frames Received Ok */
+#define MACB_FCSE  0x0050 /* Frame Check Sequence Errors */
+#define MACB_ALE   0x0054 /* Alignment Errors */
+#define MACB_DTF   0x0058 /* Deferred Transmission Frames */
+#define MACB_LCOL  0x005c /* Late Collisions */
+#define MACB_EXCOL 0x0060 /* Excessive Collisions */
+#define MACB_TUND  0x0064 /* Transmit Underrun Errors */
+#define MACB_CSE   0x0068 /* Carrier Sense Errors */
+#define MACB_RRE   0x006c /* Receive Resource Errors */
+#define MACB_ROVR  0x0070 /* Receive Overrun Errors */
+#define MACB_RSE   0x0074 /* Receive Symbol Errors */
+#define MACB_ELE   0x0078 /* Excessive Length Errors */
+#define MACB_RJA   0x007c /* Receive Jabbers */
+#define MACB_USF   0x0080 /* Undersize Frames */
+#define MACB_STE   0x0084 /* SQE Test Errors */
+#define MACB_RLE   0x0088 /* Received Length Field Mismatch */
 #define MACB_TPF   0x008c
-#define MACB_HRB   0x0090
-#define MACB_HRT   0x0094
-#define MACB_SA1B  0x0098
-#define MACB_SA1T  0x009c
-#define MACB_SA2B  0x00a0
-#define MACB_SA2T  0x00a4
-#define MACB_SA3B  0x00a8
-#define MACB_SA3T  0x00ac
-#define MACB_SA4B  0x00b0
-#define MACB_SA4T  0x00b4
-#define MACB_TID   0x00b8
+#define MACB_HRB   0x0090 /* Hash Register Bottom [31:0] */
+#define MACB_HRT   0x0094 /* Hash Register Top [63:32] */
+#define MACB_SA1B  0x0098 /* Specific Address 1 Bottom */
+#define MACB_SA1T  0x009c /* Specific Address 1 Top */
+#define MACB_SA2B  0x00a0 /* Specific Address 2 Bottom */
+#define MACB_SA2T  0x00a4 /* Specific Address 2 Top */
+#define MACB_SA3B  0x00a8 /* Specific Address 3 Bottom */
+#define MACB_SA3T  0x00ac /* Specific Address 3 Top */
+#define MACB_SA4B  0x00b0 /* Specific Address 4 Bottom */
+#define MACB_SA4T  0x00b4 /* Specific Address 4 Top */
+#define MACB_TID   0x00b8 /* Type ID Checking */
 #define MACB_TPQ   0x00bc
-#define MACB_USRIO 0x00c0
+#define MACB_USRIO 0x00c0 /* User Input/Output */
 #define MACB_WOL   0x00c4
 #define MACB_MID   0x00fc
 #define MACB_TBQPH 0x04C8
-- 
2.1.4



[RFC 0/3] net: macb: Use SRAM on SoC for RX rings and buffers

2017-07-12 Thread Alexander Dahl
Hei hei,

this is a small patch series for a problem we encoutered with a board
based on an AT91SAM9G20 SoC. I talked about it a few days ago on the
#armlinux IRC channel with Alexandre Belloni and Florian Fainelli. The
current state of those patches is 'prove of concept', but if this is
useful for someone I would like to get it upstream.

The first two patches are just adding some descriptive comments for
register and bit definitions, taken from the at91sam9g20 hardware
manual. Those may be of general interest, even without the third
patch.

The third patch is the main one. I tried to describe in the commit
message as clear as possible what problem is to be solved. If you
don't understand the reason please ask, I will happily change or
extend the message.

Regarding the third patch: It's just a proof of concept so far, I
changed some hardcoded values which probably should not be that low
for the normal usecase with RX buffers/rings in SD-RAM/DDR-RAM. It
would be good if this would be somehow configurable at boot time, so
maybe in device tree. However I would need an idea how to do this and
some help, that's why this series is just RFC.

I tested this with heavy load on the EBI bus (external SRAM), network
seems to work as usual and I get no receive overrun errors anymore.

An interesting question for me would be where this mac is used and
whether internal sram is available on those SoCs and in what sizes
(sam9g20 has 32k). A quick view in arch/arm/boot/dts just showed my
at91sam9 and sama5 platforms using the macb driver.

Greets
Alex



[RFC 2/3] net: macb: Add buffer descriptor names

2017-07-12 Thread Alexander Dahl
Documentation of the EMAC buffer descriptor bitfields. Taken from the
AT91SAM9G20 complete datasheet.

Signed-off-by: Alexander Dahl 
---
 drivers/net/ethernet/cadence/macb.h | 50 ++---
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 8547d92..567c72d 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -566,39 +566,39 @@ struct macb_dma_desc_64 {
 #define MACB_RX_WADDR_OFFSET   2
 #define MACB_RX_WADDR_SIZE 30
 
-#define MACB_RX_FRMLEN_OFFSET  0
+#define MACB_RX_FRMLEN_OFFSET  0  /* Length of frame */
 #define MACB_RX_FRMLEN_SIZE12
-#define MACB_RX_OFFSET_OFFSET  12
+#define MACB_RX_OFFSET_OFFSET  12 /* Receive buffer offset */
 #define MACB_RX_OFFSET_SIZE2
-#define MACB_RX_SOF_OFFSET 14
+#define MACB_RX_SOF_OFFSET 14 /* Start of frame */
 #define MACB_RX_SOF_SIZE   1
-#define MACB_RX_EOF_OFFSET 15
+#define MACB_RX_EOF_OFFSET 15 /* End of frame */
 #define MACB_RX_EOF_SIZE   1
-#define MACB_RX_CFI_OFFSET 16
+#define MACB_RX_CFI_OFFSET 16 /* Concatenation format 
indicator */
 #define MACB_RX_CFI_SIZE   1
-#define MACB_RX_VLAN_PRI_OFFSET17
+#define MACB_RX_VLAN_PRI_OFFSET17 /* VLAN priority */
 #define MACB_RX_VLAN_PRI_SIZE  3
-#define MACB_RX_PRI_TAG_OFFSET 20
+#define MACB_RX_PRI_TAG_OFFSET 20 /* Priority tag detected */
 #define MACB_RX_PRI_TAG_SIZE   1
-#define MACB_RX_VLAN_TAG_OFFSET21
+#define MACB_RX_VLAN_TAG_OFFSET21 /* VLAN tag detected 
*/
 #define MACB_RX_VLAN_TAG_SIZE  1
-#define MACB_RX_TYPEID_MATCH_OFFSET22
+#define MACB_RX_TYPEID_MATCH_OFFSET22 /* Type ID match */
 #define MACB_RX_TYPEID_MATCH_SIZE  1
-#define MACB_RX_SA4_MATCH_OFFSET   23
+#define MACB_RX_SA4_MATCH_OFFSET   23 /* Specific address register 
4 match */
 #define MACB_RX_SA4_MATCH_SIZE 1
-#define MACB_RX_SA3_MATCH_OFFSET   24
+#define MACB_RX_SA3_MATCH_OFFSET   24 /* Specific address register 
3 match */
 #define MACB_RX_SA3_MATCH_SIZE 1
-#define MACB_RX_SA2_MATCH_OFFSET   25
+#define MACB_RX_SA2_MATCH_OFFSET   25 /* Specific address register 
2 match */
 #define MACB_RX_SA2_MATCH_SIZE 1
-#define MACB_RX_SA1_MATCH_OFFSET   26
+#define MACB_RX_SA1_MATCH_OFFSET   26 /* Specific address register 
1 match */
 #define MACB_RX_SA1_MATCH_SIZE 1
-#define MACB_RX_EXT_MATCH_OFFSET   28
+#define MACB_RX_EXT_MATCH_OFFSET   28 /* External address match */
 #define MACB_RX_EXT_MATCH_SIZE 1
-#define MACB_RX_UHASH_MATCH_OFFSET 29
+#define MACB_RX_UHASH_MATCH_OFFSET 29 /* Unicast hash match */
 #define MACB_RX_UHASH_MATCH_SIZE   1
-#define MACB_RX_MHASH_MATCH_OFFSET 30
+#define MACB_RX_MHASH_MATCH_OFFSET 30 /* Multicast hash match */
 #define MACB_RX_MHASH_MATCH_SIZE   1
-#define MACB_RX_BROADCAST_OFFSET   31
+#define MACB_RX_BROADCAST_OFFSET   31 /* Global all ones broadcast 
addr detected */
 #define MACB_RX_BROADCAST_SIZE 1
 
 #define MACB_RX_FRMLEN_MASK0xFFF
@@ -612,11 +612,11 @@ struct macb_dma_desc_64 {
 #define GEM_RX_CSUM_OFFSET 22
 #define GEM_RX_CSUM_SIZE   2
 
-#define MACB_TX_FRMLEN_OFFSET  0
+#define MACB_TX_FRMLEN_OFFSET  0  /* Length of buffer */
 #define MACB_TX_FRMLEN_SIZE11
-#define MACB_TX_LAST_OFFSET15
+#define MACB_TX_LAST_OFFSET15 /* Last buffer */
 #define MACB_TX_LAST_SIZE  1
-#define MACB_TX_NOCRC_OFFSET   16
+#define MACB_TX_NOCRC_OFFSET   16 /* No CRC */
 #define MACB_TX_NOCRC_SIZE 1
 #define MACB_MSS_MFS_OFFSET16
 #define MACB_MSS_MFS_SIZE  14
@@ -624,15 +624,15 @@ struct macb_dma_desc_64 {
 #define MACB_TX_LSO_SIZE   2
 #define MACB_TX_TCP_SEQ_SRC_OFFSET 19
 #define MACB_TX_TCP_SEQ_SRC_SIZE   1
-#define MACB_TX_BUF_EXHAUSTED_OFFSET   27
+#define MACB_TX_BUF_EXHAUSTED_OFFSET   27 /* Buffers exhausted in mid 
frame */
 #define MACB_TX_BUF_EXHAUSTED_SIZE 1
-#define 

[PATCH net 1/2] socket: add documentation for missing elements

2017-07-12 Thread Stephen Hemminger
Fill in missing kernel-doc for missing elements in struct sock.

Signed-off-by: Stephen Hemminger 
---
 include/net/sock.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8c85791fc196..f69c8c2782df 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -246,6 +246,7 @@ struct sock_common {
   *@sk_policy: flow policy
   *@sk_receive_queue: incoming packets
   *@sk_wmem_alloc: transmit queue bytes committed
+  *@sk_tsq_flags: TCP Small Queues flags
   *@sk_write_queue: Packet sending queue
   *@sk_omem_alloc: "o" is "option" or "other"
   *@sk_wmem_queued: persistent queue size
@@ -257,6 +258,7 @@ struct sock_common {
   *@sk_pacing_status: Pacing status (requested, handled by sch_fq)
   *@sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE)
   *@sk_sndbuf: size of send buffer in bytes
+  *@__sk_flags_offset: empty field used to determine location of bitfield
   *@sk_padding: unused element for alignment
   *@sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets
   *@sk_no_check_rx: allow zero checksum in RX packets
@@ -277,6 +279,7 @@ struct sock_common {
   *@sk_drops: raw/udp drops counter
   *@sk_ack_backlog: current listen backlog
   *@sk_max_ack_backlog: listen backlog set in listen()
+  *@sk_uid: user id of owner
   *@sk_priority: %SO_PRIORITY setting
   *@sk_type: socket type (%SOCK_STREAM, etc)
   *@sk_protocol: which protocol this socket belongs in this network family
-- 
2.11.0



[PATCH net 2/2] datagram: fix kernel-doc comments

2017-07-12 Thread Stephen Hemminger
An underscore in the kernel-doc comment section has special meaning
and mis-use generates an errors.

./net/core/datagram.c:207: ERROR: Unknown target name: "msg".
./net/core/datagram.c:379: ERROR: Unknown target name: "msg".
./net/core/datagram.c:816: ERROR: Unknown target name: "t".

Signed-off-by: Stephen Hemminger 
---
 net/core/datagram.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 6877c43cc92d..ee5647bd91b3 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -203,7 +203,7 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
 /**
  * __skb_try_recv_datagram - Receive a datagram skbuff
  * @sk: socket
- * @flags: MSG_ flags
+ * @flags: MSG\_ flags
  * @destructor: invoked under the receive lock on successful dequeue
  * @peeked: returns non-zero if this packet has been seen before
  * @off: an offset in bytes to peek skb from. Returns an offset
@@ -375,7 +375,7 @@ EXPORT_SYMBOL(__sk_queue_drop_skb);
  * skb_kill_datagram - Free a datagram skbuff forcibly
  * @sk: socket
  * @skb: datagram skbuff
- * @flags: MSG_ flags
+ * @flags: MSG\_ flags
  *
  * This function frees a datagram skbuff that was received by
  * skb_recv_datagram.  The flags argument must match the one
@@ -809,7 +809,7 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
  * sequenced packet sockets providing the socket receive queue
  * is only ever holding data ready to receive.
  *
- * Note: when you _don't_ use this routine for this protocol,
+ * Note: when you *don't* use this routine for this protocol,
  * and you use a different write policy from sock_writeable()
  * then please supply your own write_space callback.
  */
-- 
2.11.0



[PATCH net 0/2] minor net kernel-doc fixes

2017-07-12 Thread Stephen Hemminger
Fix a couple of small errors in kernel-doc for networking

Stephen Hemminger (2):
  socket: add documentation for missing elements
  datagram: fix kernel-doc comments

 include/net/sock.h  | 3 +++
 net/core/datagram.c | 6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

-- 
2.11.0



[PATCH net] sfc: don't read beyond unicast address list

2017-07-12 Thread Bert Kenward
If we have more than 32 unicast MAC addresses assigned to an interface
we will read beyond the end of the address table in the driver when
adding filters. The next 256 entries store multicast addresses, so we
will end up attempting to insert duplicate filters, which is mostly
harmless. If we add more than 288 unicast addresses we will then read
past the multicast address table, which is likely to be more exciting.

Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in 
efx_ef10_filter_sync_rx_mode")
Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/ef10.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 761c518b2f92..13f72f5b18d2 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -5034,12 +5034,9 @@ static void efx_ef10_filter_uc_addr_list(struct efx_nic 
*efx)
struct efx_ef10_filter_table *table = efx->filter_state;
struct net_device *net_dev = efx->net_dev;
struct netdev_hw_addr *uc;
-   int addr_count;
unsigned int i;
 
-   addr_count = netdev_uc_count(net_dev);
table->uc_promisc = !!(net_dev->flags & IFF_PROMISC);
-   table->dev_uc_count = 1 + addr_count;
ether_addr_copy(table->dev_uc_list[0].addr, net_dev->dev_addr);
i = 1;
netdev_for_each_uc_addr(uc, net_dev) {
@@ -5050,6 +5047,8 @@ static void efx_ef10_filter_uc_addr_list(struct efx_nic 
*efx)
ether_addr_copy(table->dev_uc_list[i].addr, uc->addr);
i++;
}
+
+   table->dev_uc_count = i;
 }
 
 static void efx_ef10_filter_mc_addr_list(struct efx_nic *efx)
@@ -5057,12 +5056,11 @@ static void efx_ef10_filter_mc_addr_list(struct efx_nic 
*efx)
struct efx_ef10_filter_table *table = efx->filter_state;
struct net_device *net_dev = efx->net_dev;
struct netdev_hw_addr *mc;
-   unsigned int i, addr_count;
+   unsigned int i;
 
table->mc_overflow = false;
table->mc_promisc = !!(net_dev->flags & (IFF_PROMISC | IFF_ALLMULTI));
 
-   addr_count = netdev_mc_count(net_dev);
i = 0;
netdev_for_each_mc_addr(mc, net_dev) {
if (i >= EFX_EF10_FILTER_DEV_MC_MAX) {
-- 
2.7.5



Re: [PATCH net 20/20] net/hinic: Add ethtool and stats

2017-07-12 Thread Andrew Lunn
On Wed, Jul 12, 2017 at 10:17:26PM +0800, Aviad Krawczyk wrote:

Hi Avaid

> +
> +static void hinic_tx_timeout(struct net_device *netdev)
> +{
> + struct hinic_dev *nic_dev = netdev_priv(netdev);
> +
> + netif_err(nic_dev, drv, netdev, "Tx timeout\n");
> +}
> +
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> +static void hinic_netpoll(struct net_device *netdev)
> +{
> + struct hinic_dev *nic_dev = netdev_priv(netdev);
> + struct hinic_hwdev *hwdev = nic_dev->hwdev;
> + int i, num_qps = hinic_hwdev_num_qps(hwdev);
> +
> + for (i = 0; i < num_qps; i++) {
> + struct hinic_txq *txq = _dev->txqs[i];
> + struct hinic_rxq *rxq = _dev->rxqs[i];
> +
> + napi_schedule(>napi);
> + napi_schedule(>napi);
> + }
> +}
> +#endif

This has nothing to do with ethtool support. Separate patch please.

 Andrew


Re: [PATCH net 02/20] nic/hinic: Initialize hw device components

2017-07-12 Thread Andrew Lunn
> +/**
> + * get_dev_cap - get device capabilities
> + * @hwdev: the NIC HW device to get capabilities for
> + *
> + * Return 0 - Success, negative - Failure
> + **/
> +static int get_dev_cap(struct hinic_hwdev *hwdev)
> +{
> + struct hinic_pfhwdev *pfhwdev;
> + struct hinic_hwif *hwif = hwdev->hwif;
> + struct pci_dev *pdev = hwif->pdev;
> + int err;
> +
> + switch (HINIC_FUNC_TYPE(hwif)) {
> + case HINIC_PPF:
> + case HINIC_PF:
> + pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
> +
> + err = get_cap_from_fw(pfhwdev);
> + if (err) {
> + dev_err(>dev, "Failed to get capability from 
> FW\n");
> + return err;
> + }
> + break;
> +
> + default:
> + pr_err("Unsupported PCI Function type\n");

Hi Aviad

more pr_err(). Please go through all the patches and use dev_err(), or
netif_err() if appropriate.

Thanks
Andrew


Re: [PATCH net 01/20] net/hinic: Initialize hw interface

2017-07-12 Thread Andrew Lunn
> +
> +#define HINIC_DRV_NAME   "HiNIC"
> +#define HINIC_DRV_VERSION"1.0"

Hi Aviad

Please don't add a driver version. There was a discussion about this
recently, how pointless it is.

> +/**
> + * hinic_init_hwdev - Initialize the NIC HW
> + * @hwdev: the NIC HW device that is returned from the initialization
> + * @pdev: the NIC pci device
> + *
> + * Return 0 - Success, negative - Failure
> + *
> + * Initialize the NIC HW device and return a pointer to it in the first arg
> + **/
> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev)
> +{
> + struct hinic_pfhwdev *pfhwdev;
> + struct hinic_hwif *hwif;
> + int err;
> +
> + hwif = kzalloc(sizeof(*hwif), GFP_KERNEL);

Using the devm_ functions makes your cleanup code simpler when
handling memory.

> +/**
> + * nic_dev_init - Initialize the NIC device
> + * @pdev: the NIC pci device
> + *
> + * Return 0 - Success, negative - Failure
> + **/
> +static int nic_dev_init(struct pci_dev *pdev)
> +{
> + struct hinic_dev *nic_dev;
> + struct net_device *netdev;
> + struct hinic_hwdev *hwdev;
> + int err, num_qps;
> +
> + err = hinic_init_hwdev(, pdev);
> + if (err) {
> + dev_err(>dev, "Failed to initialize HW device\n");
> + return err;
> + }
> +
> + num_qps = hinic_hwdev_num_qps(hwdev);
> + if (num_qps <= 0) {
> + dev_err(>dev, "Invalid number of QPS\n");
> + err = -EINVAL;
> + goto num_qps_err;
> + }
> +
> + netdev = alloc_etherdev_mq(sizeof(*nic_dev), num_qps);
> + if (!netdev) {
> + pr_err("Failed to allocate Ethernet device\n");

Above you used dev_err, here you used pr_err(). Please be consistent.

> + err = -ENOMEM;
> + goto alloc_etherdev_err;
> + }
> +
> + netdev->netdev_ops = _netdev_ops;
> +
> + nic_dev = (struct hinic_dev *)netdev_priv(netdev);
> + nic_dev->hwdev = hwdev;
> + nic_dev->netdev = netdev;
> + nic_dev->msg_enable = MSG_ENABLE_DEFAULT;
> +
> + pci_set_drvdata(pdev, netdev);
> +
> + netif_carrier_off(netdev);
> +
> + err = register_netdev(netdev);
> + if (err) {
> + netif_err(nic_dev, probe, netdev, "Failed to register 
> netdev\n");

probably not a good idea to use netif_err, if register_netdev just
failed. dev_err() would be better.



Re: [PATCH V2] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

2017-07-12 Thread David Miller
From: Arend van Spriel 
Date: Wed, 12 Jul 2017 13:49:23 +0200

> On 7/7/2017 10:09 PM, Arend van Spriel wrote:
>> The lower level nl80211 code in cfg80211 ensures that "len" is between
>> 25 and NL80211_ATTR_FRAME (2304).  We subtract DOT11_MGMT_HDR_LEN (24)
>> from
>> "len" so thats's max of 2280.  However, the action_frame->data[]
>> buffer is
>> only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy()
>> can
>> overflow.
>>
>>  memcpy(action_frame->data, [DOT11_MGMT_HDR_LEN],
>> le16_to_cpu(action_frame->len));
>>
>> Cc: sta...@vger.kernel.org # 3.9.x
>> Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.")
>> Reported-by: "freenerguo(郭大兴)" 
>> Signed-off-by: Arend van Spriel 
>> ---
>>   V2:
>>- added Fixes: tag and Cc: for stable kernels.
>>- Cc: patch to netdev list.
>> ---
>> Hi David,
>>
>> Here is the patch as Linus send it to us and secur...@kernel.org. I
>> removed the lower bound check as that is already done in cfg80211.
>> Now I signed off on the patch although formally I suppose Linus should
>> sign it off. Putting it out there so people can respond as deemed
>> necessary.
>>
>> The reason for submitting it to your tree is the fact that Kalle is
>> on vacation for next 10 days or so which was indicated to me by
>> Johannes.
>> The patch applies to the master branch of your net repository. For
>> reference V1 of this patch can be found here [1].
> 
> Hi Dave,
> 
> Not sure if you missed this one. It is addressing a reported security
> issue and intended for the net repository, not net-next which is
> obviously closed [2].

Thanks for pointing this out to me, I'll take care of it.



Re: [PATCH net] net: hns: Bugfix for Tx timeout handling in hns driver

2017-07-12 Thread David Miller
From: Lin Yun Sheng 
Date: Wed, 12 Jul 2017 19:09:59 +0800

> When hns port type is not debug mode, netif_tx_disable is called
> when there is a tx timeout, which requires system reboot to return
> to normal state. This patch fix this problem by resetting the net
> dev.
> 
> Fixes: b5996f11ea54 ("net: add Hisilicon Network Subsystem basic ethernet 
> support")
> Signed-off-by: Lin Yun Sheng 

Apaplied.


Re: [PATCH v1 net-next 1/5] drop_monitor: import netnamespace framework

2017-07-12 Thread David Miller

You must provide a proper "[PATCH vx net-next 0/N]" header posting with
a patch series, which describes at a high level what the patch series
on a whole is doing, how it is doing it, and why it is doing it that way.

Second, net-next is closed:

http://vger.kernel.org/~davem/net-next.html

So you should resubmit this when it is open again.

Thanks.


Re: [PATCH net] nfp: freeing the wrong variable

2017-07-12 Thread David Miller
From: Dan Carpenter 
Date: Wed, 12 Jul 2017 10:42:06 +0300

> We accidentally free a NULL pointer and leak the pointer we want to
> free.  Also you can tell from the label name what was intended.  :)
> 
> Fixes: abfcdc1de9bf ("nfp: add a stats handler for flower offloads")
> Signed-off-by: Dan Carpenter 

Applied.


Re: [PATCH net] net: ipmr: ipmr_get_table() returns NULL

2017-07-12 Thread David Miller
From: Dan Carpenter 
Date: Wed, 12 Jul 2017 10:56:47 +0300

> The ipmr_get_table() function doesn't return error pointers it returns
> NULL on error.
> 
> Fixes: 4f75ba6982bc ("net: ipmr: Add ipmr_rtm_getroute")
> Signed-off-by: Dan Carpenter 

Applied.


Re: [patch net 0/4] mlxsw: spectrum: Various fixes

2017-07-12 Thread David Miller
From: Jiri Pirko 
Date: Wed, 12 Jul 2017 09:12:51 +0200

> First patch adds a missing rollback in error path. Second patch prevents
> a use-after-free during IPv4 route replace. Last two patches fix warnings
> from static checkers.

Series applied, thanks.


Re: [PATCH net 00/20] Huawei HiNIC Ethernet Driver

2017-07-12 Thread David Miller

The net-next tree is closed:

http://vger.kernel.org/~davem/net-next.html

Please resubmit this when the net-next tree is open again.

Thank you.


Re: [PATCH net-next 3/4] net-next: mediatek: add support for MediaTek MT7622 SoC

2017-07-12 Thread Andrew Lunn
Hi Sean

>  static void mtk_phy_link_adjust(struct net_device *dev)
>  {
>   struct mtk_mac *mac = netdev_priv(dev);
> @@ -269,6 +311,7 @@ static int mtk_phy_connect(struct net_device *dev)
>   if (!np)
>   return -ENODEV;
>  
> + mac->ge_mode = 0;
>   switch (of_get_phy_mode(np)) {
>   case PHY_INTERFACE_MODE_TRGMII:
>   mac->trgmii = true;
> @@ -276,7 +319,15 @@ static int mtk_phy_connect(struct net_device *dev)
>   case PHY_INTERFACE_MODE_RGMII_RXID:
>   case PHY_INTERFACE_MODE_RGMII_ID:
>   case PHY_INTERFACE_MODE_RGMII:
> - mac->ge_mode = 0;
> + break;
> + case PHY_INTERFACE_MODE_SGMII:
> + if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII))
> + mtk_gmac_sgmii_hw_setup(eth, mac->id);
> + break;

> + case PHY_INTERFACE_MODE_INTERNAL:
> + if (MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_ESW) && !mac->id)
> + /* Setup the path through ESW internal switch */
> + mtk_w32(eth, MTK_MUX_TO_ESW, MTK_MAC_MISC);

This bit is interesting. Generally, there is no PHY at all between the
MAC and the switch. So i don't think this is correct. Please can you
take this out for the moment, until you actually add support for the
switch. We can discuss it then, when we see the bigger picture.

Andrew


Re: [PATCH net-next 2/4] net-next: mediatek: add platform data to adapt into various hardware

2017-07-12 Thread Andrew Lunn
> +static int mtk_clk_enable(struct mtk_eth *eth)
> +{
> + int clk, ret;
> +
> + for (clk = 0; clk < MTK_CLK_MAX ; clk++) {
> + if (eth->clks[clk]) {
> + ret = clk_prepare_enable(eth->clks[clk]);
> + if (ret)
> + goto err_disable_clks;
> + }
> + }
> +
> + return 0;
> +
> +err_disable_clks:
> + while (--clk >= 0) {
> + if (eth->clks[clk])
> + clk_disable_unprepare(eth->clks[clk]);
> + }
> +
> + return ret;
> +}

> +
>  static int mtk_hw_init(struct mtk_eth *eth)
>  {
>   int i, val;
> @@ -1847,10 +1881,8 @@ static int mtk_hw_init(struct mtk_eth *eth)
>   pm_runtime_enable(eth->dev);
>   pm_runtime_get_sync(eth->dev);
>  
> - clk_prepare_enable(eth->clks[MTK_CLK_ETHIF]);
> - clk_prepare_enable(eth->clks[MTK_CLK_ESW]);
> - clk_prepare_enable(eth->clks[MTK_CLK_GP1]);
> - clk_prepare_enable(eth->clks[MTK_CLK_GP2]);
> + mtk_clk_enable(eth);
> +

mtk_clk_enable() returns an error code. It is probably a good idea to
use it, especially if it could be EPRODE_DEFER.

Andrew


[PATCH net 05/20] net/hinic: Add management messages

2017-07-12 Thread Aviad Krawczyk
Add the management messages for sending to api cmd and add the
asynchronous event handler for the completion of the messages.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c   |  35 ++
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h   |   3 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h|   5 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c  | 471 -
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h  |  59 +++
 5 files changed, 569 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
index 1cd91ae..5d32c91 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
@@ -55,6 +55,41 @@ enum api_cmd_xor_chk_level {
 };
 
 /**
+ * api_cmd - API CMD command
+ * @chain: chain for the command
+ * @dest: destination node on the card that will receive the command
+ * @cmd: command data
+ * @size: the command size
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int api_cmd(struct hinic_api_cmd_chain *chain,
+  enum hinic_node_id dest, void *cmd, u16 cmd_size)
+{
+   /* should be implemented */
+   return -EINVAL;
+}
+
+/**
+ * hinic_api_cmd_write - Write API CMD command
+ * @chain: chain for write command
+ * @dest: destination node on the card that will receive the command
+ * @cmd: command data
+ * @size: the command size
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+int hinic_api_cmd_write(struct hinic_api_cmd_chain *chain,
+   enum hinic_node_id dest, void *cmd, u16 size)
+{
+   /* Verify the chain type */
+   if (chain->chain_type == HINIC_API_CMD_WRITE_TO_MGMT_CPU)
+   return api_cmd(chain, dest, cmd, size);
+
+   return -EINVAL;
+}
+
+/**
  * api_cmd_hw_restart - restart the chain in the HW
  * @chain: the API CMD specific chain to restart
  *
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h
index 0596b55..32aff9f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h
@@ -132,6 +132,9 @@ struct hinic_api_cmd_chain {
struct hinic_api_cmd_cell   *curr_node;
 };
 
+int hinic_api_cmd_write(struct hinic_api_cmd_chain *chain,
+   enum hinic_node_id dest, void *cmd, u16 size);
+
 int hinic_api_cmd_init(struct hinic_hwif *hwif,
   struct hinic_api_cmd_chain **chain);
 
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
index 4f83446..fa61cef 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
@@ -93,6 +93,7 @@
 #define HINIC_HWIF_NUM_IRQS(hwif)  ((hwif)->attr.num_irqs)
 #define HINIC_HWIF_GLOB_IDX(hwif)  ((hwif)->attr.func_global_idx)
 #define HINIC_HWIF_PCI_INTF(hwif)  ((hwif)->attr.pci_intf_idx)
+#define HINIC_HWIF_PF_IDX(hwif)((hwif)->attr.pf_idx)
 
 #define HINIC_FUNC_TYPE(hwif)  ((hwif)->attr.func_type)
 #define HINIC_IS_PF(hwif)  (HINIC_FUNC_TYPE(hwif) == HINIC_PF)
@@ -127,6 +128,10 @@ enum hinic_mod_type {
HINIC_MOD_MAX   = 15
 };
 
+enum hinic_node_id {
+   HINIC_NODE_ID_MGMT = 21,
+};
+
 struct hinic_func_attr {
u16 func_global_idx;
u8  pf_idx;
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c
index e864448..2759054 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c
@@ -18,6 +18,12 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include "hinic_hw_if.h"
 #include "hinic_hw_eqs.h"
@@ -25,9 +31,269 @@
 #include "hinic_hw_mgmt.h"
 #include "hinic_hw_dev.h"
 
+#define SYNC_MSG_ID_MASK   0x1FF
+
+#define SYNC_MSG_ID(pf_to_mgmt)((pf_to_mgmt)->sync_msg_id)
+
+#define SYNC_MSG_ID_INC(pf_to_mgmt)(SYNC_MSG_ID(pf_to_mgmt) = \
+   ((SYNC_MSG_ID(pf_to_mgmt) + 1) & \
+SYNC_MSG_ID_MASK))
+
+#define MSG_SZ_IS_VALID(in_size)   ((in_size) <= MAX_MSG_SZ)
+
+#define MGMT_MSG_SIZE_MIN  20
+#define MGMT_MSG_SIZE_STEP 16
+#defineMGMT_MSG_RSVD_FOR_DEV   8
+
+#define SEGMENT_LEN48
+
+#define MAX_PF_MGMT_BUF_SIZE   2048
+
+/* Data should be SEG LEN size aligned */
+#define MAX_MSG_SZ 2016
+
+#define MSG_NOT_RESP   0x
+
+#define MGMT_MSG_TIMEOUT   1000
+
 #define mgmt_to_pfhwdev(pf_mgmt)   \
  

[PATCH net 01/20] net/hinic: Initialize hw interface

2017-07-12 Thread Aviad Krawczyk
Initialize hw interface as part of the nic initialization for accessing hw.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 Documentation/networking/hinic.txt | 125 
 drivers/net/ethernet/Kconfig   |   1 +
 drivers/net/ethernet/Makefile  |   1 +
 drivers/net/ethernet/huawei/Kconfig|  19 ++
 drivers/net/ethernet/huawei/Makefile   |   5 +
 drivers/net/ethernet/huawei/hinic/Kconfig  |  13 ++
 drivers/net/ethernet/huawei/hinic/Makefile |   3 +
 drivers/net/ethernet/huawei/hinic/hinic_dev.h  |  34 
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h   |  36 
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c   | 220 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h   |  42 
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c| 208 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h| 160 +++
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 212 
 .../net/ethernet/huawei/hinic/hinic_pci_id_tbl.h   |  27 +++
 15 files changed, 1106 insertions(+)
 create mode 100644 Documentation/networking/hinic.txt
 create mode 100644 drivers/net/ethernet/huawei/Kconfig
 create mode 100644 drivers/net/ethernet/huawei/Makefile
 create mode 100644 drivers/net/ethernet/huawei/hinic/Kconfig
 create mode 100644 drivers/net/ethernet/huawei/hinic/Makefile
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_dev.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_main.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h

diff --git a/Documentation/networking/hinic.txt 
b/Documentation/networking/hinic.txt
new file mode 100644
index 000..c826660
--- /dev/null
+++ b/Documentation/networking/hinic.txt
@@ -0,0 +1,125 @@
+Linux Kernel Driver for Huawei Intelligent NIC(HiNIC) family
+
+
+Overview:
+=
+HiNIC is a network interface card for the Data Center Area.
+
+The driver supports a range of link-speed devices (10GbE, 25GbE, 40GbE, etc.).
+The driver supports also a negotiated and extendable feature set.
+
+Some HiNIC devices support SR-IOV. This driver is used for Physical Function
+(PF).
+
+HiNIC devices support MSI-X interrupt vector for each Tx/Rx queue and
+adaptive interrupt moderation.
+
+HiNIC devices support also various offload features such as checksum offload,
+TCP Transmit Segmentation Offload(TSO), Receive-Side Scaling(RSS) and
+LRO(Large Receive Offload).
+
+
+Supported PCI vendor ID/device IDs:
+===
+
+19e5:1822 - HiNIC PF
+
+
+Driver Architecture and Source Code:
+
+
+hinic_dev - Implement a Logical Network device that is independent from
+specific HW details about HW data structure formats.
+
+hinic_hwdev - Implement the HW details of the device and include the components
+for accessing the PCI NIC.
+
+hinic_hwdev contains the following components:
+===
+
+HW Interface:
+=
+
+The interface for accessing the pci device (DMA memory and PCI BARs).
+(hinic_hw_if.c, hinic_hw_if.h)
+
+Configuration Status Registers Area that describes the HW Registers on the
+configuration and status BAR0. (hinic_hw_csr.h)
+
+MGMT components:
+
+
+Asynchronous Event Queues(AEQs) - The event queues for receiving messages from
+the MGMT modules on the cards. (hinic_hw_eqs.c, hinic_hw_eqs.h)
+
+Application Programmable Interface commands(API CMD) - Interface for sending
+MGMT commands to the card. (hinic_hw_api_cmd.c, hinic_hw_api_cmd.h)
+
+Management (MGMT) - the PF to MGMT channel that uses API CMD for sending MGMT
+commands to the card and receives notifications from the MGMT modules on the
+card by AEQs. Also set the addresses of the IO CMDQs in HW.
+(hinic_hw_mgmt.c, hinic_hw_mgmt.h)
+
+IO components:
+==
+
+Completion Event Queues(CEQs) - The completion Event Queues that describe IO
+tasks that are finished. (hinic_hw_eqs.c, hinic_hw_eqs.h)
+
+Work Queues(WQ) - Contain the memory and operations for use by CMD queues and
+the Queue Pairs. The WQ is a Memory Block in a Page. The Block contains
+pointers to Memory Areas that are the Memory for the Work Queue Elements(WQEs).
+(hinic_hw_wq.c, hinic_hw_wq.h)
+
+Command Queues(CMDQ) - The queues for sending commands for IO management and is
+used to set the QPs addresses in HW. The commands completion events are
+accumulated on the CEQ that is 

[PATCH net 00/20] Huawei HiNIC Ethernet Driver

2017-07-12 Thread Aviad Krawczyk
The patch-set contains the support of the HiNIC Ethernet driver for 
hinic family of PCIE Network interface cards.

The Huawei's PCIE HiNIC card is a new ethernet card and hence there was 
a need of a new driver.

The current driver is meant to be used for the Physical Function and 
there would soon be a support for Virtual Function and more features 
once this basic PF driver has been accepted.

Aviad Krawczyk (20):
  net/hinic: Initialize hw interface
  nic/hinic: Initialize hw device components
  net/hinic: Initialize api cmd resources
  net/hinic: Initialize api cmd hw
  net/hinic: Add management messages
  net/hinic: Add api cmd commands
  net/hinic: Add aeqs
  net/hinic: Add port management commands
  net/hinic: Add Rx mode and link event handler
  net/hinic: Add logical Txq and Rxq
  net/hinic: Add wq
  net/hinic: Add qp resources
  net/hinic: Set qp context
  net/hinic: Initialize cmdq
  net/hinic: Add ceqs
  net/hinic: Add cmdq commands
  net/hinic: Add cmdq completion handler
  net/hinic: Add Rx handler
  net/hinic: Add Tx operation
  net/hinic: Add ethtool and stats

 Documentation/networking/hinic.txt |  125 +++
 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/huawei/Kconfig|   19 +
 drivers/net/ethernet/huawei/Makefile   |5 +
 drivers/net/ethernet/huawei/hinic/Kconfig  |   13 +
 drivers/net/ethernet/huawei/hinic/Makefile |6 +
 drivers/net/ethernet/huawei/hinic/hinic_common.c   |   80 ++
 drivers/net/ethernet/huawei/hinic/hinic_common.h   |   38 +
 drivers/net/ethernet/huawei/hinic/hinic_dev.h  |   65 ++
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c   |  990 +
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h   |  208 
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c  |  942 
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h  |  302 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h   |  149 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c   | 1065 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h   |  239 
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c   |  879 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h   |  265 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c|  353 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h|  272 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c|  534 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h|   92 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c  |  628 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h  |  153 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c|  871 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h|  394 +++
 .../net/ethernet/huawei/hinic/hinic_hw_qp_ctxt.h   |  214 
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c|  867 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h|  111 ++
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 1153 
 .../net/ethernet/huawei/hinic/hinic_pci_id_tbl.h   |   27 +
 drivers/net/ethernet/huawei/hinic/hinic_port.c |  403 +++
 drivers/net/ethernet/huawei/hinic/hinic_port.h |  198 
 drivers/net/ethernet/huawei/hinic/hinic_rx.c   |  513 +
 drivers/net/ethernet/huawei/hinic/hinic_rx.h   |   55 +
 drivers/net/ethernet/huawei/hinic/hinic_tx.c   |  513 +
 drivers/net/ethernet/huawei/hinic/hinic_tx.h   |   62 ++
 39 files changed, 12812 insertions(+)
 create mode 100644 Documentation/networking/hinic.txt
 create mode 100644 drivers/net/ethernet/huawei/Kconfig
 create mode 100644 drivers/net/ethernet/huawei/Makefile
 create mode 100644 drivers/net/ethernet/huawei/hinic/Kconfig
 create mode 100644 drivers/net/ethernet/huawei/hinic/Makefile
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_common.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_common.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_dev.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
 create mode 100644 

[PATCH net 04/20] net/hinic: Initialize api cmd hw

2017-07-12 Thread Aviad Krawczyk
Update the hardware about api cmd resources and initialize api cmd hw.

Signed-off-by: Aviad Krawczyk 
Sigend-off-by: Zhaochen 
---
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c   | 174 -
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h   |  38 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h   |  25 +++
 3 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
index dc80fa7..1cd91ae 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
@@ -15,6 +15,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -22,8 +23,12 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 
+#include "hinic_hw_csr.h"
 #include "hinic_hw_if.h"
 #include "hinic_hw_api_cmd.h"
 
@@ -36,8 +41,157 @@
(((cell_size) >= API_CMD_CELL_SIZE_MIN) ? \
 (1 << (fls(cell_size - 1))) : API_CMD_CELL_SIZE_MIN)
 
+#define API_CMD_CELL_SIZE_VAL(size)\
+   ilog2((size) >> API_CMD_CELL_SIZE_SHIFT)
+
 #define API_CMD_BUF_SIZE   2048
 
+#define API_CMD_TIMEOUT1000
+
+enum api_cmd_xor_chk_level {
+   XOR_CHK_DIS = 0,
+
+   XOR_CHK_ALL = 3,
+};
+
+/**
+ * api_cmd_hw_restart - restart the chain in the HW
+ * @chain: the API CMD specific chain to restart
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int api_cmd_hw_restart(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   unsigned long end;
+   u32 reg_addr, val;
+   int err = -ETIMEDOUT;
+
+   /* Read Modify Write */
+   reg_addr = HINIC_CSR_API_CMD_CHAIN_REQ_ADDR(chain->chain_type);
+   val = hinic_hwif_read_reg(hwif, reg_addr);
+
+   val = HINIC_API_CMD_CHAIN_REQ_CLEAR(val, RESTART);
+   val |= HINIC_API_CMD_CHAIN_REQ_SET(1, RESTART);
+
+   hinic_hwif_write_reg(hwif, reg_addr, val);
+
+   end = jiffies + msecs_to_jiffies(API_CMD_TIMEOUT);
+   do {
+   val = hinic_hwif_read_reg(hwif, reg_addr);
+
+   if (!HINIC_API_CMD_CHAIN_REQ_GET(val, RESTART)) {
+   err = 0;
+   break;
+   }
+
+   msleep(20);
+   } while (time_before(jiffies, end));
+
+   return err;
+}
+
+/**
+ * api_cmd_ctrl_init - set the control register of a chain
+ * @chain: the API CMD specific chain to set control register for
+ **/
+static void api_cmd_ctrl_init(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   u32 reg_addr, ctrl;
+   u16 cell_size;
+
+   /* Read Modify Write */
+   reg_addr = HINIC_CSR_API_CMD_CHAIN_CTRL_ADDR(chain->chain_type);
+
+   cell_size = API_CMD_CELL_SIZE_VAL(chain->cell_size);
+
+   ctrl = hinic_hwif_read_reg(hwif, reg_addr);
+
+   ctrl =  HINIC_API_CMD_CHAIN_CTRL_CLEAR(ctrl, RESTART_WB_STAT) &
+   HINIC_API_CMD_CHAIN_CTRL_CLEAR(ctrl, XOR_ERR)   &
+   HINIC_API_CMD_CHAIN_CTRL_CLEAR(ctrl, AEQE_EN)   &
+   HINIC_API_CMD_CHAIN_CTRL_CLEAR(ctrl, XOR_CHK_EN) &
+   HINIC_API_CMD_CHAIN_CTRL_CLEAR(ctrl, CELL_SIZE);
+
+   ctrl |= HINIC_API_CMD_CHAIN_CTRL_SET(1, XOR_ERR) |
+   HINIC_API_CMD_CHAIN_CTRL_SET(XOR_CHK_ALL, XOR_CHK_EN) |
+   HINIC_API_CMD_CHAIN_CTRL_SET(cell_size, CELL_SIZE);
+
+   hinic_hwif_write_reg(hwif, reg_addr, ctrl);
+}
+
+/**
+ * api_cmd_set_status_addr - set the status address of a chain in the HW
+ * @chain: the API CMD specific chain to set in HW status address for
+ **/
+static void api_cmd_set_status_addr(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   u32 addr, val;
+
+   addr = HINIC_CSR_API_CMD_STATUS_HI_ADDR(chain->chain_type);
+   val = upper_32_bits(chain->wb_status_paddr);
+   hinic_hwif_write_reg(hwif, addr, val);
+
+   addr = HINIC_CSR_API_CMD_STATUS_LO_ADDR(chain->chain_type);
+   val = lower_32_bits(chain->wb_status_paddr);
+   hinic_hwif_write_reg(hwif, addr, val);
+}
+
+/**
+ * api_cmd_set_num_cells - set the number cells of a chain in the HW
+ * @chain: the API CMD specific chain to set in HW the number of cells for
+ **/
+static void api_cmd_set_num_cells(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   u32 addr, val;
+
+   addr = HINIC_CSR_API_CMD_CHAIN_NUM_CELLS_ADDR(chain->chain_type);
+   val  = chain->num_cells;
+   hinic_hwif_write_reg(hwif, addr, val);
+}
+
+/**
+ * api_cmd_head_init - set the head of a chain in the HW
+ * @chain: the API CMD specific chain to set in HW the head for
+ **/
+static void api_cmd_head_init(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif 

[PATCH net 07/20] net/hinic: Add aeqs

2017-07-12 Thread Aviad Krawczyk
Handle aeq elements that are accumulated on the aeq by calling the
registered handler for the specific event.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h |  49 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c | 456 ++-
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h |  81 
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c  |  92 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h  |  46 +++
 5 files changed, 722 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
index f2774e3..ae84719 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
@@ -65,4 +65,53 @@
 #define HINIC_CSR_API_CMD_STATUS_ADDR(idx) \
(HINIC_CSR_API_CMD_BASE + 0x30 + (idx) * HINIC_CSR_API_CMD_STRIDE)
 
+/* MSI-X registers */
+#define HINIC_CSR_MSIX_CTRL_BASE   0x2000
+#define HINIC_CSR_MSIX_CNT_BASE0x2004
+
+#define HINIC_CSR_MSIX_STRIDE  0x8
+
+#define HINIC_CSR_MSIX_CTRL_ADDR(idx)  \
+   (HINIC_CSR_MSIX_CTRL_BASE + (idx) * HINIC_CSR_MSIX_STRIDE)
+
+#define HINIC_CSR_MSIX_CNT_ADDR(idx)   \
+   (HINIC_CSR_MSIX_CNT_BASE + (idx) * HINIC_CSR_MSIX_STRIDE)
+
+/* EQ registers */
+#define HINIC_AEQ_MTT_OFF_BASE_ADDR0x200
+
+#define HINIC_EQ_MTT_OFF_STRIDE0x40
+
+#define HINIC_CSR_AEQ_MTT_OFF(id)  \
+   (HINIC_AEQ_MTT_OFF_BASE_ADDR + (id) * HINIC_EQ_MTT_OFF_STRIDE)
+
+#define HINIC_CSR_EQ_PAGE_OFF_STRIDE   8
+
+#define HINIC_CSR_AEQ_HI_PHYS_ADDR_REG(q_id, pg_num)   \
+   (HINIC_CSR_AEQ_MTT_OFF(q_id) + \
+(pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE)
+
+#define HINIC_CSR_AEQ_LO_PHYS_ADDR_REG(q_id, pg_num)   \
+   (HINIC_CSR_AEQ_MTT_OFF(q_id) + \
+(pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE + 4)
+
+#define HINIC_AEQ_CTRL_0_ADDR_BASE 0xE00
+#define HINIC_AEQ_CTRL_1_ADDR_BASE 0xE04
+#define HINIC_AEQ_CONS_IDX_ADDR_BASE   0xE08
+#define HINIC_AEQ_PROD_IDX_ADDR_BASE   0xE0C
+
+#define HINIC_EQ_OFF_STRIDE0x80
+
+#define HINIC_CSR_AEQ_CTRL_0_ADDR(idx) \
+   (HINIC_AEQ_CTRL_0_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
+#define HINIC_CSR_AEQ_CTRL_1_ADDR(idx) \
+   (HINIC_AEQ_CTRL_1_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
+#define HINIC_CSR_AEQ_CONS_IDX_ADDR(idx)   \
+   (HINIC_AEQ_CONS_IDX_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
+#define HINIC_CSR_AEQ_PROD_IDX_ADDR(idx)   \
+   (HINIC_AEQ_PROD_IDX_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c
index da63da2..8a4c95f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c
@@ -13,17 +13,76 @@
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
+#include "hinic_hw_csr.h"
 #include "hinic_hw_if.h"
 #include "hinic_hw_eqs.h"
 
 #define HINIC_EQS_WQ_NAME  "hinic_eqs"
 
+#define GET_EQ_NUM_PAGES(eq, pg_size)  \
+   (ALIGN((eq)->q_len * (eq)->elem_size, pg_size) / (pg_size))
+
+#define GET_EQ_NUM_ELEMS_IN_PG(eq, pg_size)((pg_size) / (eq)->elem_size)
+
+#define EQ_CONS_IDX_REG_ADDR(eq)   HINIC_CSR_AEQ_CONS_IDX_ADDR((eq)->q_id)
+#define EQ_PROD_IDX_REG_ADDR(eq)   HINIC_CSR_AEQ_PROD_IDX_ADDR((eq)->q_id)
+
+#define EQ_HI_PHYS_ADDR_REG(eq, pg_num)\
+   HINIC_CSR_AEQ_HI_PHYS_ADDR_REG((eq)->q_id, pg_num)
+
+#define EQ_LO_PHYS_ADDR_REG(eq, pg_num)\
+   HINIC_CSR_AEQ_LO_PHYS_ADDR_REG((eq)->q_id, pg_num)
+
+#define GET_EQ_ELEMENT(eq, idx)\
+   ((eq)->virt_addr[(idx) / (eq)->num_elem_in_pg] + \
+(((idx) & ((eq)->num_elem_in_pg - 1)) * (eq)->elem_size))
+
+#define GET_AEQ_ELEM(eq, idx)  ((struct hinic_aeq_elem *) \
+   GET_EQ_ELEMENT(eq, idx))
+
+#define GET_CURR_AEQ_ELEM(eq)  GET_AEQ_ELEM(eq, (eq)->cons_idx)
+
+#define PAGE_IN_4K(page_size)  ((page_size) >> 12)
+#define EQ_SET_HW_PAGE_SIZE_VAL(eq)(ilog2(PAGE_IN_4K((eq)->page_size)))
+
+#define ELEMENT_SIZE_IN_32B(eq)(((eq)->elem_size) >> 5)
+#define EQ_SET_HW_ELEM_SIZE_VAL(eq)(ilog2(ELEMENT_SIZE_IN_32B(eq)))
+
+#define EQ_MAX_PAGES   8
+
+#define aeq_to_aeqs(eq)\
+   

[PATCH net 11/20] net/hinic: Add wq

2017-07-12 Thread Aviad Krawczyk
Create work queues for use by the queue pairs for Tx and Rx operations.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile  |   4 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c |  65 ++-
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h |   6 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h |  17 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c | 506 
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h |  84 
 6 files changed, 677 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index ce0787c..519382b 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
 hinic-y := hinic_main.o hinic_tx.o hinic_rx.o hinic_port.o hinic_hw_dev.o \
-  hinic_hw_io.o hinic_hw_mgmt.o hinic_hw_api_cmd.o hinic_hw_eqs.o \
-  hinic_hw_if.o
\ No newline at end of file
+  hinic_hw_io.o hinic_hw_wq.o hinic_hw_mgmt.o hinic_hw_api_cmd.o \
+  hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
index aa03127..1453b08 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "hinic_hw_if.h"
+#include "hinic_hw_wq.h"
 #include "hinic_hw_qp.h"
 #include "hinic_hw_io.h"
 
@@ -40,8 +41,31 @@ static int init_qp(struct hinic_func_to_io *func_to_io,
   struct msix_entry *sq_msix_entry,
   struct msix_entry *rq_msix_entry)
 {
-   /* should be implemented */
+   int err;
+
+   qp->q_id = q_id;
+
+   err = hinic_wq_allocate(_to_io->wqs, _to_io->sq_wq[q_id],
+   HINIC_SQ_WQEBB_SIZE, HINIC_SQ_PAGE_SIZE,
+   HINIC_SQ_DEPTH, HINIC_SQ_WQE_MAX_SIZE);
+   if (err) {
+   pr_err("Failed to allocate WQ for SQ\n");
+   return err;
+   }
+
+   err = hinic_wq_allocate(_to_io->wqs, _to_io->rq_wq[q_id],
+   HINIC_RQ_WQEBB_SIZE, HINIC_RQ_PAGE_SIZE,
+   HINIC_RQ_DEPTH, HINIC_RQ_WQE_SIZE);
+   if (err) {
+   pr_err("Failed to allocate WQ for RQ\n");
+   goto rq_alloc_err;
+   }
+
return 0;
+
+rq_alloc_err:
+   hinic_wq_free(_to_io->wqs, _to_io->sq_wq[q_id]);
+   return err;
 }
 
 /**
@@ -52,7 +76,10 @@ static int init_qp(struct hinic_func_to_io *func_to_io,
 static void destroy_qp(struct hinic_func_to_io *func_to_io,
   struct hinic_qp *qp)
 {
-   /* should be implemented */
+   int q_id = qp->q_id;
+
+   hinic_wq_free(_to_io->wqs, _to_io->rq_wq[q_id]);
+   hinic_wq_free(_to_io->wqs, _to_io->sq_wq[q_id]);
 }
 
 /**
@@ -70,7 +97,7 @@ int hinic_io_create_qps(struct hinic_func_to_io *func_to_io,
struct msix_entry *sq_msix_entries,
struct msix_entry *rq_msix_entries)
 {
-   size_t qps_size;
+   size_t qps_size, wq_size;
int i, j, err;
 
qps_size = num_qps * sizeof(*func_to_io->qps);
@@ -78,6 +105,20 @@ int hinic_io_create_qps(struct hinic_func_to_io *func_to_io,
if (!func_to_io->qps)
return -ENOMEM;
 
+   wq_size = num_qps * sizeof(*func_to_io->sq_wq);
+   func_to_io->sq_wq = kzalloc(wq_size, GFP_KERNEL);
+   if (!func_to_io->sq_wq) {
+   err = -ENOMEM;
+   goto sq_wq_err;
+   }
+
+   wq_size = num_qps * sizeof(*func_to_io->rq_wq);
+   func_to_io->rq_wq = kzalloc(wq_size, GFP_KERNEL);
+   if (!func_to_io->rq_wq) {
+   err = -ENOMEM;
+   goto rq_wq_err;
+   }
+
for (i = 0; i < num_qps; i++) {
err = init_qp(func_to_io, _to_io->qps[i], i,
  _msix_entries[i], _msix_entries[i]);
@@ -93,6 +134,12 @@ int hinic_io_create_qps(struct hinic_func_to_io *func_to_io,
for (j = 0; j < i; j++)
destroy_qp(func_to_io, _to_io->qps[j]);
 
+   kfree(func_to_io->rq_wq);
+
+rq_wq_err:
+   kfree(func_to_io->sq_wq);
+
+sq_wq_err:
kfree(func_to_io->qps);
return err;
 }
@@ -109,6 +156,9 @@ void hinic_io_destroy_qps(struct hinic_func_to_io 
*func_to_io, int num_qps)
for (i = 0; i < num_qps; i++)
destroy_qp(func_to_io, _to_io->qps[i]);
 
+   kfree(func_to_io->rq_wq);
+   kfree(func_to_io->sq_wq);
+
kfree(func_to_io->qps);
 }
 
@@ -126,10 +176,18 @@ int hinic_io_init(struct hinic_func_to_io *func_to_io,
  

[PATCH net 10/20] net/hinic: Add logical Txq and Rxq

2017-07-12 Thread Aviad Krawczyk
Create the logical queues of the nic.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile   |   5 +-
 drivers/net/ethernet/huawei/hinic/hinic_dev.h|   5 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 133 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h |  20 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c  | 142 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h  |  46 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h  |  32 +
 drivers/net/ethernet/huawei/hinic/hinic_main.c   | 171 ++-
 drivers/net/ethernet/huawei/hinic/hinic_rx.c |  72 ++
 drivers/net/ethernet/huawei/hinic/hinic_rx.h |  46 ++
 drivers/net/ethernet/huawei/hinic/hinic_tx.c |  76 ++
 drivers/net/ethernet/huawei/hinic/hinic_tx.h |  49 +++
 12 files changed, 793 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_rx.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_rx.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_tx.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_tx.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index 08951a6..ce0787c 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
-hinic-y := hinic_main.o hinic_port.o hinic_hw_dev.o hinic_hw_mgmt.o \
-  hinic_hw_api_cmd.o hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
+hinic-y := hinic_main.o hinic_tx.o hinic_rx.o hinic_port.o hinic_hw_dev.o \
+  hinic_hw_io.o hinic_hw_mgmt.o hinic_hw_api_cmd.o hinic_hw_eqs.o \
+  hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index 7cb9533..026ed65 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -23,6 +23,8 @@
 #include 
 
 #include "hinic_hw_dev.h"
+#include "hinic_tx.h"
+#include "hinic_rx.h"
 
 #define HINIC_DRV_NAME "HiNIC"
 #define HINIC_DRV_VERSION  "1.0"
@@ -50,6 +52,9 @@ struct hinic_dev {
 
struct hinic_rx_mode_work   rx_mode_work;
struct workqueue_struct *workq;
+
+   struct hinic_txq*txqs;
+   struct hinic_rxq*rxqs;
 };
 
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 31747dd..753ec46 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -26,6 +26,8 @@
 #include "hinic_hw_if.h"
 #include "hinic_hw_eqs.h"
 #include "hinic_hw_mgmt.h"
+#include "hinic_hw_qp.h"
+#include "hinic_hw_io.h"
 #include "hinic_hw_dev.h"
 
 #define MAX_IRQS(max_qps, num_aeqs, num_ceqs)  \
@@ -237,6 +239,101 @@ int hinic_port_msg_cmd(struct hinic_hwdev *hwdev, enum 
hinic_port_cmd cmd,
 }
 
 /**
+ * get_base_qpn - get the first qp number
+ * @hwdev: the NIC HW device
+ * @base_qpn: returned qp number
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int get_base_qpn(struct hinic_hwdev *hwdev, u16 *base_qpn)
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   struct hinic_cmd_base_qpn cmd_base_qpn;
+   u16 out_size;
+   int err;
+
+   cmd_base_qpn.func_idx = HINIC_HWIF_GLOB_IDX(hwif);
+
+   err = hinic_port_msg_cmd(hwdev, HINIC_PORT_CMD_GET_GLOBAL_QPN,
+_base_qpn, sizeof(cmd_base_qpn),
+_base_qpn, _size);
+   if (err || (out_size != sizeof(cmd_base_qpn)) || cmd_base_qpn.status) {
+   dev_err(>dev, "Failed to get base qpn, status = %d\n",
+   cmd_base_qpn.status);
+   return -EFAULT;
+   }
+
+   *base_qpn = cmd_base_qpn.qpn;
+   return 0;
+}
+
+/**
+ * hinic_hwdev_ifup - Preparing the HW for passing IO
+ * @hwdev: the NIC HW device
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+int hinic_hwdev_ifup(struct hinic_hwdev *hwdev)
+{
+   struct hinic_func_to_io *func_to_io = >func_to_io;
+   struct hinic_cap *nic_cap = >nic_cap;
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   int num_qps = nic_cap->num_qps;
+   int max_qps = nic_cap->max_qps;
+   struct msix_entry *sq_msix_entries;
+   struct msix_entry *rq_msix_entries;
+   u16 base_qpn;
+   int err, num_aeqs, num_ceqs;
+
+   err = get_base_qpn(hwdev, _qpn);
+   

[PATCH net 08/20] net/hinic: Add port management commands

2017-07-12 Thread Aviad Krawczyk
Add the port management commands that are sent as management messages.
The port management commands are used for netdev operations.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile   |   4 +-
 drivers/net/ethernet/huawei/hinic/hinic_dev.h|   4 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c |  29 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h |  29 +++
 drivers/net/ethernet/huawei/hinic/hinic_main.c   | 205 ++-
 drivers/net/ethernet/huawei/hinic/hinic_port.c   | 241 +++
 drivers/net/ethernet/huawei/hinic/hinic_port.h   |  68 +++
 7 files changed, 577 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_port.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_port.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index 88223d0..08951a6 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
-hinic-y := hinic_main.o hinic_hw_dev.o hinic_hw_mgmt.o hinic_hw_api_cmd.o \
-  hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
+hinic-y := hinic_main.o hinic_port.o hinic_hw_dev.o hinic_hw_mgmt.o \
+  hinic_hw_api_cmd.o hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index 425f833..f642186 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -18,6 +18,7 @@
 
 #include 
 #include 
+#include 
 
 #include "hinic_hw_dev.h"
 
@@ -29,6 +30,9 @@ struct hinic_dev {
struct hinic_hwdev  *hwdev;
 
u32 msg_enable;
+
+   struct semaphoremgmt_lock;
+   unsigned long   *vlan_bitmap;
 };
 
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index ad253c7..c6138f1 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -208,6 +208,35 @@ static void free_msix(struct hinic_hwdev *hwdev)
 }
 
 /**
+ * hinic_port_msg_cmd - send port msg to mgmt
+ * @hwdev: the NIC HW device
+ * @cmd: the port command
+ * @buf_in: input buffer
+ * @in_size: input size
+ * @buf_out: output buffer
+ * @out_size: returned output size
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+int hinic_port_msg_cmd(struct hinic_hwdev *hwdev, enum hinic_port_cmd cmd,
+  void *buf_in, u16 in_size, void *buf_out, u16 *out_size)
+{
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_hwif *hwif = hwdev->hwif;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
+   pr_err("unsupported PCI Function type\n");
+   return -EINVAL;
+   }
+
+   pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+   return hinic_msg_to_mgmt(>pf_to_mgmt, HINIC_MOD_L2NIC, cmd,
+buf_in, in_size, buf_out, out_size,
+HINIC_MGMT_MSG_SYNC);
+}
+
+/**
  * init_pfhwdev - Initialize the extended components of PF
  * @pfhwdev: the HW device for PF
  *
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
index a825e76..03795be 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
@@ -30,6 +30,31 @@ struct hinic_cap {
u16 num_qps;
 };
 
+enum hinic_port_cmd {
+   HINIC_PORT_CMD_CHANGE_MTU = 2,
+
+   HINIC_PORT_CMD_ADD_VLAN = 3,
+   HINIC_PORT_CMD_DEL_VLAN = 4,
+
+   HINIC_PORT_CMD_SET_MAC = 9,
+   HINIC_PORT_CMD_GET_MAC = 10,
+   HINIC_PORT_CMD_DEL_MAC = 11,
+
+   HINIC_PORT_CMD_SET_RX_MODE = 12,
+
+   HINIC_PORT_CMD_GET_LINK_STATE = 24,
+
+   HINIC_PORT_CMD_SET_PORT_STATE = 41,
+
+   HINIC_PORT_CMD_FWCTXT_INIT = 69,
+
+   HINIC_PORT_CMD_SET_FUNC_STATE = 93,
+
+   HINIC_PORT_CMD_GET_GLOBAL_QPN = 102,
+
+   HINIC_PORT_CMD_GET_CAP = 170,
+};
+
 struct hinic_hwdev {
struct hinic_hwif   *hwif;
struct msix_entry   *msix_entries;
@@ -45,6 +70,10 @@ struct hinic_pfhwdev {
struct hinic_pf_to_mgmt pf_to_mgmt;
 };
 
+int hinic_port_msg_cmd(struct hinic_hwdev *hwdev, enum hinic_port_cmd cmd,
+  void *buf_in, u16 in_size, void *buf_out,
+  u16 *out_size);
+
 int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev);
 
 void hinic_free_hwdev(struct hinic_hwdev *hwdev);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index ea75ace..2d50650 

[PATCH net 09/20] net/hinic: Add Rx mode and link event handler

2017-07-12 Thread Aviad Krawczyk
Add port management message for setting Rx mode in the card,
used for rx_mode netdev operation.
The link event handler is used for getting a notification about
the link state.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_dev.h |  17 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h  |   2 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c  | 118 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h  |  37 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c   |  17 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h   |  17 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c |  65 -
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h |  28 +++
 drivers/net/ethernet/huawei/hinic/hinic_main.c| 289 +-
 drivers/net/ethernet/huawei/hinic/hinic_port.c| 101 
 drivers/net/ethernet/huawei/hinic/hinic_port.h|  66 +
 11 files changed, 754 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index f642186..7cb9533 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -19,20 +19,37 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hinic_hw_dev.h"
 
 #define HINIC_DRV_NAME "HiNIC"
 #define HINIC_DRV_VERSION  "1.0"
 
+enum hinic_flags {
+   HINIC_LINK_UP = BIT(0),
+   HINIC_INTF_UP = BIT(1),
+};
+
+struct hinic_rx_mode_work {
+   struct work_struct  work;
+   u32 rx_mode;
+};
+
 struct hinic_dev {
struct net_device   *netdev;
struct hinic_hwdev  *hwdev;
 
u32 msg_enable;
 
+   unsigned intflags;
+
struct semaphoremgmt_lock;
unsigned long   *vlan_bitmap;
+
+   struct hinic_rx_mode_work   rx_mode_work;
+   struct workqueue_struct *workq;
 };
 
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
index ae84719..6f9df4d 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
@@ -20,6 +20,8 @@
 #define HINIC_CSR_FUNC_ATTR0_ADDR  0x0
 #define HINIC_CSR_FUNC_ATTR1_ADDR  0x4
 
+#define HINIC_CSR_FUNC_ATTR5_ADDR  0x14
+
 #define HINIC_DMA_ATTR_BASE0xC80
 #define HINIC_ELECTION_BASE0x4200
 
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index c6138f1..31747dd 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -237,6 +237,112 @@ int hinic_port_msg_cmd(struct hinic_hwdev *hwdev, enum 
hinic_port_cmd cmd,
 }
 
 /**
+ * hinic_hwdev_cb_register - register callback handler for MGMT events
+ * @hwdev: the NIC HW device
+ * @cmd: the mgmt event
+ * @handle: private data for the handler
+ * @handler: event handler
+ **/
+void hinic_hwdev_cb_register(struct hinic_hwdev *hwdev,
+enum hinic_mgmt_msg_cmd cmd, void *handle,
+void (*handler)(void *handle, void *buf_in,
+u16 in_size, void *buf_out,
+u16 *out_size))
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_nic_cb *nic_cb;
+   u8 cmd_cb;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
+   pr_err("unsupported PCI Function type\n");
+   return;
+   }
+
+   pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+   cmd_cb = cmd - HINIC_MGMT_MSG_CMD_BASE;
+   nic_cb = >nic_cb[cmd_cb];
+
+   nic_cb->handler = handler;
+   nic_cb->handle = handle;
+   nic_cb->cb_state = HINIC_CB_ENABLED;
+}
+
+/**
+ * hinic_hwdev_cb_unregister - unregister callback handler for MGMT events
+ * @hwdev: the NIC HW device
+ * @cmd: the mgmt event
+ **/
+void hinic_hwdev_cb_unregister(struct hinic_hwdev *hwdev,
+  enum hinic_mgmt_msg_cmd cmd)
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_nic_cb *nic_cb;
+   u8 cmd_cb;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
+   pr_err("unsupported PCI Function type\n");
+   return;
+   }
+
+   pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+   cmd_cb = cmd - HINIC_MGMT_MSG_CMD_BASE;
+   nic_cb = >nic_cb[cmd_cb];
+
+   nic_cb->cb_state &= ~HINIC_CB_ENABLED;
+
+   while (nic_cb->cb_state & HINIC_CB_RUNNING)
+  

[PATCH net 13/20] net/hinic: Set qp context

2017-07-12 Thread Aviad Krawczyk
Update the nic about the resources of the queue pairs.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile |   5 +-
 drivers/net/ethernet/huawei/hinic/hinic_common.c   |  55 ++
 drivers/net/ethernet/huawei/hinic/hinic_common.h   |  23 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c  |  87 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h  |  88 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c   |   4 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c| 149 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h|   5 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c| 161 
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h|  11 ++
 .../net/ethernet/huawei/hinic/hinic_hw_qp_ctxt.h   | 214 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h|   9 +
 12 files changed, 809 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_common.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_common.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_qp_ctxt.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index 24728f0..82c1f68 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
 hinic-y := hinic_main.o hinic_tx.o hinic_rx.o hinic_port.o hinic_hw_dev.o \
-  hinic_hw_io.o hinic_hw_qp.o hinic_hw_wq.o hinic_hw_mgmt.o \
-  hinic_hw_api_cmd.o hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
+  hinic_hw_io.o hinic_hw_qp.o hinic_hw_cmdq.o hinic_hw_wq.o \
+  hinic_hw_mgmt.o hinic_hw_api_cmd.o hinic_hw_eqs.o hinic_hw_if.o \
+  hinic_common.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_common.c 
b/drivers/net/ethernet/huawei/hinic/hinic_common.c
new file mode 100644
index 000..3b439e9
--- /dev/null
+++ b/drivers/net/ethernet/huawei/hinic/hinic_common.c
@@ -0,0 +1,55 @@
+/*
+ * Huawei HiNIC PCI Express Linux driver
+ * Copyright(c) 2017 Huawei Technologies Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ */
+
+#include 
+#include 
+
+#include "hinic_common.h"
+
+/**
+ * hinic_cpu_to_be32 - convert data to big endian 32 bit format
+ * @data: the data to convert
+ * @len: length of data to convert
+ **/
+void hinic_cpu_to_be32(void *data, int len)
+{
+   int i, chunk_sz = sizeof(u32);
+   u32 *mem = data;
+
+   len = len / chunk_sz;
+
+   for (i = 0; i < len; i++) {
+   *mem = cpu_to_be32(*mem);
+   mem++;
+   }
+}
+
+/**
+ * hinic_be32_to_cpu - convert data from big endian 32 bit format
+ * @data: the data to convert
+ * @len: length of data to convert
+ **/
+void hinic_be32_to_cpu(void *data, int len)
+{
+   int i, chunk_sz = sizeof(u32);
+   u32 *mem = data;
+
+   len = len / chunk_sz;
+
+   for (i = 0; i < len; i++) {
+   *mem = be32_to_cpu(*mem);
+   mem++;
+   }
+}
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_common.h 
b/drivers/net/ethernet/huawei/hinic/hinic_common.h
new file mode 100644
index 000..21921ec
--- /dev/null
+++ b/drivers/net/ethernet/huawei/hinic/hinic_common.h
@@ -0,0 +1,23 @@
+/*
+ * Huawei HiNIC PCI Express Linux driver
+ * Copyright(c) 2017 Huawei Technologies Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ */
+
+#ifndef HINIC_COMMON_H
+#define HINIC_COMMON_H
+
+void hinic_cpu_to_be32(void *data, int len);
+
+void hinic_be32_to_cpu(void *data, int len);
+
+#endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
new file mode 100644
index 000..2fd3924
--- /dev/null
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
@@ -0,0 +1,87 @@
+/*
+ * Huawei HiNIC PCI 

[PATCH net 15/20] net/hinic: Add ceqs

2017-07-12 Thread Aviad Krawczyk
Initialize the completion event queues and handle ceq events by calling
the registered handlers. Used for the completion event of cmdq commands.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c |  17 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h  |  29 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c  |   7 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c  | 290 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h  |  75 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c   |  15 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h   |   3 +
 7 files changed, 428 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
index 450d254..30be1db 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
@@ -29,6 +29,7 @@
 #include 
 
 #include "hinic_hw_if.h"
+#include "hinic_hw_eqs.h"
 #include "hinic_hw_mgmt.h"
 #include "hinic_hw_wq.h"
 #include "hinic_hw_cmdq.h"
@@ -112,6 +113,16 @@ int hinic_cmdq_direct_resp(struct hinic_cmdqs *cmdqs,
 }
 
 /**
+ * cmdq_ceq_handler - cmdq completion event handler
+ * @handle: private data for the handler(cmdqs)
+ * @ceqe_data: ceq element data
+ **/
+static void cmdq_ceq_handler(void *handle, u32 ceqe_data)
+{
+   /* should be implemented */
+}
+
+/**
  * cmdq_init_queue_ctxt - init the queue ctxt of a cmdq
  * @cmdq: the cmdq
  * @cmdq_pages: the memory of the queue
@@ -324,6 +335,9 @@ int hinic_init_cmdqs(struct hinic_cmdqs *cmdqs, struct 
hinic_hwif *hwif,
goto cmdq_ctxt_err;
}
 
+   hinic_ceq_register_cb(_to_io->ceqs, HINIC_CEQ_CMDQ, cmdqs,
+ cmdq_ceq_handler);
+
return 0;
 
 cmdq_ctxt_err:
@@ -344,8 +358,11 @@ int hinic_init_cmdqs(struct hinic_cmdqs *cmdqs, struct 
hinic_hwif *hwif,
  **/
 void hinic_free_cmdqs(struct hinic_cmdqs *cmdqs)
 {
+   struct hinic_func_to_io *func_to_io = cmdqs_to_func_to_io(cmdqs);
enum hinic_cmdq_type cmdq_type;
 
+   hinic_ceq_unregister_cb(_to_io->ceqs, HINIC_CEQ_CMDQ);
+
cmdq_type = HINIC_CMDQ_SYNC;
for (; cmdq_type < HINIC_MAX_CMDQ_TYPES; cmdq_type++)
free_cmdq(>cmdq[cmdq_type]);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
index 6f9df4d..08d16a0 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
@@ -81,27 +81,44 @@
 
 /* EQ registers */
 #define HINIC_AEQ_MTT_OFF_BASE_ADDR0x200
+#define HINIC_CEQ_MTT_OFF_BASE_ADDR0x400
 
 #define HINIC_EQ_MTT_OFF_STRIDE0x40
 
 #define HINIC_CSR_AEQ_MTT_OFF(id)  \
(HINIC_AEQ_MTT_OFF_BASE_ADDR + (id) * HINIC_EQ_MTT_OFF_STRIDE)
 
+#define HINIC_CSR_CEQ_MTT_OFF(id)  \
+   (HINIC_CEQ_MTT_OFF_BASE_ADDR + (id) * HINIC_EQ_MTT_OFF_STRIDE)
+
 #define HINIC_CSR_EQ_PAGE_OFF_STRIDE   8
 
 #define HINIC_CSR_AEQ_HI_PHYS_ADDR_REG(q_id, pg_num)   \
(HINIC_CSR_AEQ_MTT_OFF(q_id) + \
 (pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE)
 
+#define HINIC_CSR_CEQ_HI_PHYS_ADDR_REG(q_id, pg_num)   \
+   (HINIC_CSR_CEQ_MTT_OFF(q_id) +  \
+(pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE)
+
 #define HINIC_CSR_AEQ_LO_PHYS_ADDR_REG(q_id, pg_num)   \
(HINIC_CSR_AEQ_MTT_OFF(q_id) + \
 (pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE + 4)
 
+#define HINIC_CSR_CEQ_LO_PHYS_ADDR_REG(q_id, pg_num)   \
+   (HINIC_CSR_CEQ_MTT_OFF(q_id) +  \
+(pg_num) * HINIC_CSR_EQ_PAGE_OFF_STRIDE + 4)
+
 #define HINIC_AEQ_CTRL_0_ADDR_BASE 0xE00
 #define HINIC_AEQ_CTRL_1_ADDR_BASE 0xE04
 #define HINIC_AEQ_CONS_IDX_ADDR_BASE   0xE08
 #define HINIC_AEQ_PROD_IDX_ADDR_BASE   0xE0C
 
+#define HINIC_CEQ_CTRL_0_ADDR_BASE 0x1000
+#define HINIC_CEQ_CTRL_1_ADDR_BASE 0x1004
+#define HINIC_CEQ_CONS_IDX_ADDR_BASE   0x1008
+#define HINIC_CEQ_PROD_IDX_ADDR_BASE   0x100C
+
 #define HINIC_EQ_OFF_STRIDE0x80
 
 #define HINIC_CSR_AEQ_CTRL_0_ADDR(idx) \
@@ -116,4 +133,16 @@
 #define HINIC_CSR_AEQ_PROD_IDX_ADDR(idx)   \
(HINIC_AEQ_PROD_IDX_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
 
+#define HINIC_CSR_CEQ_CTRL_0_ADDR(idx) \
+   (HINIC_CEQ_CTRL_0_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
+#define HINIC_CSR_CEQ_CTRL_1_ADDR(idx) \
+   (HINIC_CEQ_CTRL_1_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+
+#define HINIC_CSR_CEQ_CONS_IDX_ADDR(idx)   \
+   (HINIC_CEQ_CONS_IDX_ADDR_BASE + (idx) * HINIC_EQ_OFF_STRIDE)
+

[PATCH net 14/20] net/hinic: Initialize cmdq

2017-07-12 Thread Aviad Krawczyk
Create the work queues for cmdq and update the nic about the cmdq
contexts. cmdq commands are used for updating the nic about the
qp contexts.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 284 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h |  53 
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h  |   2 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h |   5 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c   | 156 
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h   |   8 +
 6 files changed, 502 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
index 2fd3924..450d254 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
@@ -13,11 +13,51 @@
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include "hinic_hw_if.h"
+#include "hinic_hw_mgmt.h"
+#include "hinic_hw_wq.h"
 #include "hinic_hw_cmdq.h"
+#include "hinic_hw_io.h"
+#include "hinic_hw_dev.h"
+
+#define CMDQ_DB_OFFSZ_2K
+
+#define CMDQ_WQEBB_SIZE64
+#defineCMDQ_DEPTH  SZ_4K
+
+#define CMDQ_WQ_PAGE_SIZE  SZ_4K
+
+#define WQE_LCMD_SIZE  64
+#define WQE_SCMD_SIZE  64
+
+#define CMDQ_PFN(addr, page_size)  ((addr) >> (ilog2(page_size)))
+
+#define cmdq_to_cmdqs(cmdq)container_of((cmdq) - (cmdq)->cmdq_type, \
+struct hinic_cmdqs, cmdq[0])
+
+#define cmdqs_to_func_to_io(cmdqs) container_of(cmdqs, \
+struct hinic_func_to_io, \
+cmdqs)
+
+enum cmdq_wqe_type {
+   WQE_LCMD_TYPE,
+   WQE_SCMD_TYPE,
+};
 
 /**
  * hinic_alloc_cmdq_buf - alloc buffer for sending command
@@ -29,8 +69,17 @@
 int hinic_alloc_cmdq_buf(struct hinic_cmdqs *cmdqs,
 struct hinic_cmdq_buf *cmdq_buf)
 {
-   /* should be implemented */
-   return -ENOMEM;
+   struct hinic_hwif *hwif = cmdqs->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+
+   cmdq_buf->buf = pci_pool_alloc(cmdqs->cmdq_buf_pool, GFP_KERNEL,
+  _buf->dma_addr);
+   if (!cmdq_buf->buf) {
+   dev_err(>dev, "Failed to allocate cmd from the pool\n");
+   return -ENOMEM;
+   }
+
+   return 0;
 }
 
 /**
@@ -41,7 +90,7 @@ int hinic_alloc_cmdq_buf(struct hinic_cmdqs *cmdqs,
 void hinic_free_cmdq_buf(struct hinic_cmdqs *cmdqs,
 struct hinic_cmdq_buf *cmdq_buf)
 {
-   /* should be implemented */
+   pci_pool_free(cmdqs->cmdq_buf_pool, cmdq_buf->buf, cmdq_buf->dma_addr);
 }
 
 /**
@@ -63,6 +112,172 @@ int hinic_cmdq_direct_resp(struct hinic_cmdqs *cmdqs,
 }
 
 /**
+ * cmdq_init_queue_ctxt - init the queue ctxt of a cmdq
+ * @cmdq: the cmdq
+ * @cmdq_pages: the memory of the queue
+ * @cmdq_ctxt: returned cmdq ctxt
+ **/
+static void cmdq_init_queue_ctxt(struct hinic_cmdq *cmdq,
+struct hinic_cmdq_pages *cmdq_pages,
+struct hinic_cmdq_ctxt *cmdq_ctxt)
+{
+   struct hinic_cmdqs *cmdqs = cmdq_to_cmdqs(cmdq);
+   struct hinic_hwif *hwif = cmdqs->hwif;
+   struct hinic_wq *wq = cmdq->wq;
+   struct hinic_cmdq_ctxt_info *ctxt_info = _ctxt->ctxt_info;
+   u16 start_ci = atomic_read(>cons_idx);
+   u64 wq_first_page_paddr, cmdq_first_block_paddr, pfn;
+
+   /* The data in the HW is in Big Endian Format */
+   wq_first_page_paddr = be64_to_cpu(*wq->block_vaddr);
+
+   pfn = CMDQ_PFN(wq_first_page_paddr, wq->wq_page_size);
+
+   ctxt_info->curr_wqe_page_pfn =
+   HINIC_CMDQ_CTXT_PAGE_INFO_SET(pfn, CURR_WQE_PAGE_PFN)   |
+   HINIC_CMDQ_CTXT_PAGE_INFO_SET(HINIC_CEQ_ID_CMDQ, EQ_ID) |
+   HINIC_CMDQ_CTXT_PAGE_INFO_SET(1, CEQ_ARM)   |
+   HINIC_CMDQ_CTXT_PAGE_INFO_SET(1, CEQ_EN)|
+   HINIC_CMDQ_CTXT_PAGE_INFO_SET(cmdq->wrapped, WRAPPED);
+
+   /* block PFN - Read Modify Write */
+   cmdq_first_block_paddr = cmdq_pages->page_paddr;
+
+   pfn = CMDQ_PFN(cmdq_first_block_paddr, wq->wq_page_size);
+
+   ctxt_info->wq_block_pfn =
+   HINIC_CMDQ_CTXT_BLOCK_INFO_SET(pfn, WQ_BLOCK_PFN) |
+   HINIC_CMDQ_CTXT_BLOCK_INFO_SET(start_ci, CI);
+
+   cmdq_ctxt->func_idx = HINIC_HWIF_GLOB_IDX(hwif);
+   cmdq_ctxt->cmdq_type  = cmdq->cmdq_type;
+}
+
+/**
+ * init_cmdq - initialize cmdq
+ * @cmdq: the cmdq
+ * @wq: the wq attaced to the cmdq
+ * @q_type: the cmdq type of 

[PATCH net 12/20] net/hinic: Add qp resources

2017-07-12 Thread Aviad Krawczyk
Create the resources for queue pair operations:
doorbell area, consumer index address and producer index address.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile  |   4 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h |   1 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.c | 173 ++-
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h |  27 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c | 266 
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h |  60 +-
 6 files changed, 526 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index 519382b..24728f0 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
 hinic-y := hinic_main.o hinic_tx.o hinic_rx.o hinic_port.o hinic_hw_dev.o \
-  hinic_hw_io.o hinic_hw_wq.o hinic_hw_mgmt.o hinic_hw_api_cmd.o \
-  hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
+  hinic_hw_io.o hinic_hw_qp.o hinic_hw_wq.o hinic_hw_mgmt.o \
+  hinic_hw_api_cmd.o hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
index 162c090..6bd1806 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h
@@ -137,6 +137,7 @@
 #define HINIC_IS_PPF(hwif) (HINIC_FUNC_TYPE(hwif) == HINIC_PPF)
 
 #define HINIC_PCI_CFG_REGS_BAR 0
+#define HINIC_PCI_DB_BAR   4
 
 #define HINIC_PCIE_ST_DISABLE  0
 #define HINIC_PCIE_AT_DISABLE  0
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
index 1453b08..0d6284c 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_io.c
@@ -15,17 +15,93 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "hinic_hw_if.h"
 #include "hinic_hw_wq.h"
 #include "hinic_hw_qp.h"
 #include "hinic_hw_io.h"
 
+#define CI_Q_ADDR_SIZE sizeof(u32)
+
+#define CI_ADDR(base_addr, q_id)   ((base_addr) + \
+(q_id) * CI_Q_ADDR_SIZE)
+
+#define CI_TABLE_SIZE(num_qps) ((num_qps) * CI_Q_ADDR_SIZE)
+
+#define DB_IDX(db, db_base)\
+   (((unsigned long)(db) - (unsigned long)(db_base)) / HINIC_DB_PAGE_SIZE)
+
+static void init_db_area_idx(struct hinic_free_db_area *free_db_area)
+{
+   int i;
+
+   for (i = 0; i < HINIC_DB_MAX_AREAS; i++)
+   free_db_area->db_idx[i] = i;
+
+   free_db_area->alloc_pos = 0;
+   free_db_area->return_pos = HINIC_DB_MAX_AREAS;
+
+   free_db_area->num_free = HINIC_DB_MAX_AREAS;
+
+   sema_init(_db_area->idx_lock, 1);
+}
+
+static int get_db_area(struct hinic_func_to_io *func_to_io,
+  void __iomem **db_base)
+{
+   struct hinic_free_db_area *free_db_area = _to_io->free_db_area;
+   int pos, idx;
+
+   down(_db_area->idx_lock);
+
+   free_db_area->num_free--;
+
+   if (free_db_area->num_free < 0) {
+   free_db_area->num_free++;
+   up(_db_area->idx_lock);
+   return -ENOMEM;
+   }
+
+   pos = free_db_area->alloc_pos++;
+   pos &= HINIC_DB_MAX_AREAS - 1;
+
+   idx = free_db_area->db_idx[pos];
+
+   free_db_area->db_idx[pos] = -1;
+
+   up(_db_area->idx_lock);
+
+   *db_base = func_to_io->db_base + idx * HINIC_DB_PAGE_SIZE;
+   return 0;
+}
+
+static void return_db_area(struct hinic_func_to_io *func_to_io,
+  void __iomem *db_base)
+{
+   struct hinic_free_db_area *free_db_area = _to_io->free_db_area;
+   int pos, idx = DB_IDX(db_base, func_to_io->db_base);
+
+   down(_db_area->idx_lock);
+
+   pos = free_db_area->return_pos++;
+   pos &= HINIC_DB_MAX_AREAS - 1;
+
+   free_db_area->db_idx[pos] = idx;
+
+   free_db_area->num_free++;
+
+   up(_db_area->idx_lock);
+}
+
 /**
  * init_qp - Initialize a Queue Pair
  * @func_to_io: func to io channel that holds the IO components
@@ -41,6 +117,10 @@ static int init_qp(struct hinic_func_to_io *func_to_io,
   struct msix_entry *sq_msix_entry,
   struct msix_entry *rq_msix_entry)
 {
+   struct hinic_hwif *hwif = func_to_io->hwif;
+   void *ci_addr_base = func_to_io->ci_addr_base;
+   dma_addr_t ci_dma_base = func_to_io->ci_dma_base;
+   void __iomem *db_base;
int err;
 
qp->q_id = q_id;
@@ -61,8 +141,40 @@ static int init_qp(struct 

[PATCH net 17/20] net/hinic: Add cmdq completion handler

2017-07-12 Thread Aviad Krawczyk
Add cmdq completion handler for getting a notification about the
completion of cmdq commands.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 284 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h |  12 +
 2 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
index 4427de6..c2f9b5f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
@@ -40,12 +40,31 @@
 #include "hinic_hw_io.h"
 #include "hinic_hw_dev.h"
 
+#define CMDQ_CEQE_TYPE_SHIFT   0
+
+#define CMDQ_CEQE_TYPE_MASK0x7
+
+#define CMDQ_CEQE_GET(val, member) \
+   (((val) >> CMDQ_CEQE_##member##_SHIFT) \
+& CMDQ_CEQE_##member##_MASK)
+
+#define CMDQ_WQE_ERRCODE_VAL_SHIFT 20
+
+#define CMDQ_WQE_ERRCODE_VAL_MASK  0xF
+
+#define CMDQ_WQE_ERRCODE_GET(val, member)  \
+   (((val) >> CMDQ_WQE_ERRCODE_##member##_SHIFT) \
+& CMDQ_WQE_ERRCODE_##member##_MASK)
+
 #define CMDQ_DB_PI_OFF(pi) (((u16)LOWER_8_BITS(pi)) << 3)
 
 #define CMDQ_DB_ADDR(db_base, pi)  ((db_base) + CMDQ_DB_PI_OFF(pi))
 
 #define CMDQ_WQE_HEADER(wqe)   ((struct hinic_cmdq_header *)(wqe))
 
+#define CMDQ_WQE_COMPLETED(ctrl_info)  \
+   HINIC_CMDQ_CTRL_GET(ctrl_info, HW_BUSY_BIT)
+
 #define FIRST_DATA_TO_WRITE_LAST   sizeof(u64)
 
 #define CMDQ_DB_OFFSZ_2K
@@ -115,6 +134,9 @@ enum completion_request {
CEQ_SET,
 };
 
+static void clear_wqe_complete_bit(struct hinic_cmdq *cmdq,
+  struct hinic_cmdq_wqe *wqe);
+
 /**
  * hinic_alloc_cmdq_buf - alloc buffer for sending command
  * @cmdqs: the cmdqs
@@ -149,6 +171,22 @@ void hinic_free_cmdq_buf(struct hinic_cmdqs *cmdqs,
pci_pool_free(cmdqs->cmdq_buf_pool, cmdq_buf->buf, cmdq_buf->dma_addr);
 }
 
+static int cmdq_wqe_size_from_bdlen(enum bufdesc_len len)
+{
+   int wqe_size = 0;
+
+   switch (len) {
+   case BUFDESC_LCMD_LEN:
+   wqe_size = WQE_LCMD_SIZE;
+   break;
+   case BUFDESC_SCMD_LEN:
+   wqe_size = WQE_SCMD_SIZE;
+   break;
+   }
+
+   return wqe_size;
+}
+
 static void cmdq_set_sge_completion(struct hinic_cmdq_completion *completion,
struct hinic_cmdq_buf *buf_out)
 {
@@ -215,6 +253,15 @@ static void cmdq_set_lcmd_bufdesc(struct 
hinic_cmdq_wqe_lcmd *wqe_lcmd,
hinic_set_sge(_lcmd->buf_desc.sge, buf_in->dma_addr, buf_in->size);
 }
 
+static void cmdq_set_direct_wqe_data(struct hinic_cmdq_direct_wqe *wqe,
+void *buf_in, u32 in_size)
+{
+   struct hinic_cmdq_wqe_scmd *wqe_scmd = >wqe_scmd;
+
+   wqe_scmd->buf_desc.buf_len = in_size;
+   memcpy(wqe_scmd->buf_desc.data, buf_in, in_size);
+}
+
 static void cmdq_set_lcmd_wqe(struct hinic_cmdq_wqe *wqe,
  enum cmdq_cmd_type cmd_type,
  struct hinic_cmdq_buf *buf_in,
@@ -243,6 +290,34 @@ static void cmdq_set_lcmd_wqe(struct hinic_cmdq_wqe *wqe,
cmdq_set_lcmd_bufdesc(wqe_lcmd, buf_in);
 }
 
+static void cmdq_set_direct_wqe(struct hinic_cmdq_wqe *wqe,
+   enum cmdq_cmd_type cmd_type,
+   void *buf_in, u16 in_size,
+   struct hinic_cmdq_buf *buf_out, int wrapped,
+   enum hinic_cmd_ack_type ack_type,
+   enum hinic_mod_type mod, u8 cmd, u16 prod_idx)
+{
+   struct hinic_cmdq_direct_wqe *direct_wqe = >direct_wqe;
+   struct hinic_cmdq_wqe_scmd *wqe_scmd = _wqe->wqe_scmd;
+   enum completion_format complete_format;
+
+   switch (cmd_type) {
+   case CMDQ_CMD_SYNC_SGE_RESP:
+   complete_format = COMPLETE_SGE;
+   cmdq_set_sge_completion(_scmd->completion, buf_out);
+   break;
+   case CMDQ_CMD_SYNC_DIRECT_RESP:
+   complete_format = COMPLETE_DIRECT;
+   wqe_scmd->completion.direct_resp = 0;
+   break;
+   }
+
+   cmdq_prepare_wqe_ctrl(wqe, wrapped, ack_type, mod, cmd, prod_idx,
+ complete_format, DATA_DIRECT, BUFDESC_SCMD_LEN);
+
+   cmdq_set_direct_wqe_data(direct_wqe, buf_in, in_size);
+}
+
 static void cmdq_wqe_fill(void *dst, void *src)
 {
memcpy(dst + FIRST_DATA_TO_WRITE_LAST, src + FIRST_DATA_TO_WRITE_LAST,
@@ -361,6 +436,50 @@ static int cmdq_sync_cmd_direct_resp(struct hinic_cmdq 
*cmdq,
return 0;
 }
 
+static int cmdq_set_arm_bit(struct hinic_cmdq *cmdq, void *buf_in,
+   u16 

[PATCH net 20/20] net/hinic: Add ethtool and stats

2017-07-12 Thread Aviad Krawczyk
Add ethtool operations and statistics operations.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 MAINTAINERS|   7 +
 drivers/net/ethernet/huawei/hinic/hinic_dev.h  |   3 +
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 262 -
 drivers/net/ethernet/huawei/hinic/hinic_port.c |  29 +++
 drivers/net/ethernet/huawei/hinic/hinic_port.h |  45 +
 drivers/net/ethernet/huawei/hinic/hinic_rx.c   |  19 ++
 drivers/net/ethernet/huawei/hinic/hinic_rx.h   |   2 +
 drivers/net/ethernet/huawei/hinic/hinic_tx.c   |  22 +++
 drivers/net/ethernet/huawei/hinic/hinic_tx.h   |   2 +
 9 files changed, 390 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4f4057c..5c27965 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6238,6 +6238,13 @@ L:   linux-in...@vger.kernel.org
 S: Maintained
 F: drivers/input/touchscreen/htcpen.c
 
+HUAWEI ETHERNET DRIVER
+M:  Aviad Krawczyk 
+L:  netdev@vger.kernel.org
+S:  Supported
+F:  Documentation/networking/hinic.txt
+F:  drivers/net/ethernet/huawei/*
+
 HUGETLB FILESYSTEM
 M: Nadia Yvette Chambers 
 S: Maintained
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index f59c90d..08918a8 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -57,6 +57,9 @@ struct hinic_dev {
 
struct hinic_txq*txqs;
struct hinic_rxq*rxqs;
+
+   struct hinic_txq_stats  tx_stats;
+   struct hinic_rxq_stats  rx_stats;
 };
 
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index fac0249..21f53fc 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -70,6 +70,179 @@
 
 static int change_mac_addr(struct net_device *netdev, const u8 *addr);
 
+static int hinic_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings
+   *link_ksettings)
+{
+   struct hinic_dev *nic_dev = netdev_priv(netdev);
+   struct hinic_port_cap port_cap;
+   enum hinic_autoneg_cap autoneg_cap;
+   enum hinic_autoneg_state autoneg_state;
+   enum hinic_port_link_state link_state;
+   int err;
+
+   ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
+   ethtool_link_ksettings_add_link_mode(link_ksettings, supported,
+Autoneg);
+
+   link_ksettings->base.speed = SPEED_UNKNOWN;
+   link_ksettings->base.autoneg = AUTONEG_DISABLE;
+   link_ksettings->base.duplex = DUPLEX_UNKNOWN;
+
+   err = hinic_port_get_cap(nic_dev, _cap);
+   if (err) {
+   netif_err(nic_dev, drv, netdev, "Failed to get port 
capabilities\n");
+   return err;
+   }
+
+   err = hinic_port_link_state(nic_dev, _state);
+   if (err) {
+   netif_err(nic_dev, drv, netdev, "Failed to get port link 
state\n");
+   return err;
+   }
+
+   if (link_state != HINIC_LINK_STATE_UP) {
+   netif_info(nic_dev, drv, netdev, "No link\n");
+   return err;
+   }
+
+   switch (port_cap.speed) {
+   case HINIC_SPEED_10MB_LINK:
+   link_ksettings->base.speed = SPEED_10;
+   break;
+
+   case HINIC_SPEED_100MB_LINK:
+   link_ksettings->base.speed = SPEED_100;
+   break;
+
+   case HINIC_SPEED_1000MB_LINK:
+   link_ksettings->base.speed = SPEED_1000;
+   break;
+
+   case HINIC_SPEED_10GB_LINK:
+   link_ksettings->base.speed = SPEED_1;
+   break;
+
+   case HINIC_SPEED_25GB_LINK:
+   link_ksettings->base.speed = SPEED_25000;
+   break;
+
+   case HINIC_SPEED_40GB_LINK:
+   link_ksettings->base.speed = SPEED_4;
+   break;
+
+   case HINIC_SPEED_100GB_LINK:
+   link_ksettings->base.speed = SPEED_10;
+   break;
+
+   default:
+   link_ksettings->base.speed = SPEED_UNKNOWN;
+   break;
+   }
+
+   autoneg_cap = port_cap.autoneg_cap;
+   autoneg_state = port_cap.autoneg_state;
+
+   if (!!(autoneg_cap & HINIC_AUTONEG_SUPPORTED))
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising, Autoneg);
+
+   link_ksettings->base.autoneg = (autoneg_state == HINIC_AUTONEG_ACTIVE) ?
+  AUTONEG_ENABLE : AUTONEG_DISABLE;
+   link_ksettings->base.duplex = (port_cap.duplex == HINIC_DUPLEX_FULL) ?
+  

[PATCH net 18/20] net/hinic: Add Rx handler

2017-07-12 Thread Aviad Krawczyk
Set the io resources in the nic and handle rx events by qp operations.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_dev.h |   1 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h  |   1 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c  | 359 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h  |  77 
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.c   |  36 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h   |  35 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h |  11 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c   | 195 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h   |  81 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c   |  12 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h   |   2 +
 drivers/net/ethernet/huawei/hinic/hinic_main.c|  24 ++
 drivers/net/ethernet/huawei/hinic/hinic_port.c|  32 ++
 drivers/net/ethernet/huawei/hinic/hinic_port.h|  19 +
 drivers/net/ethernet/huawei/hinic/hinic_rx.c  | 422 ++
 drivers/net/ethernet/huawei/hinic/hinic_rx.h  |   7 +
 16 files changed, 1314 insertions(+)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index 026ed65..e9273db 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -44,6 +44,7 @@ struct hinic_dev {
struct hinic_hwdev  *hwdev;
 
u32 msg_enable;
+   unsigned intrx_weight;
 
unsigned intflags;
 
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
index 08d16a0..520f7c4 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h
@@ -20,6 +20,7 @@
 #define HINIC_CSR_FUNC_ATTR0_ADDR  0x0
 #define HINIC_CSR_FUNC_ATTR1_ADDR  0x4
 
+#define HINIC_CSR_FUNC_ATTR4_ADDR  0x10
 #define HINIC_CSR_FUNC_ATTR5_ADDR  0x14
 
 #define HINIC_DMA_ATTR_BASE0xC80
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 12387c7..9b667b0 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -22,6 +22,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "hinic_hw_if.h"
 #include "hinic_hw_eqs.h"
@@ -31,6 +34,10 @@
 #include "hinic_hw_io.h"
 #include "hinic_hw_dev.h"
 
+#define IO_STATUS_TIMEOUT  100
+#define OUTBOUND_STATE_TIMEOUT 100
+#define DB_STATE_TIMEOUT   100
+
 #define MAX_IRQS(max_qps, num_aeqs, num_ceqs)  \
 (2 * (max_qps) + (num_aeqs) + (num_ceqs))
 
@@ -38,6 +45,15 @@ enum intr_type {
INTR_MSIX_TYPE,
 };
 
+enum io_status {
+   IO_STOPPED = 0x0,
+   IO_RUNNING = 0x1,
+};
+
+enum hw_ioctxt_set_cmdq_depth {
+   HW_IOCTXT_SET_CMDQ_DEPTH_DEFAULT,
+};
+
 /* HW struct */
 struct hinic_dev_cap {
u8  status;
@@ -52,6 +68,31 @@ struct hinic_dev_cap {
u8  rsvd3[208];
 };
 
+struct rx_buf_sz {
+   int idx;
+   size_t  sz;
+};
+
+static struct rx_buf_sz rx_buf_sz_table[] = {
+   {0, 32},
+   {1, 64},
+   {2, 96},
+   {3, 128},
+   {4, 192},
+   {5, 256},
+   {6, 384},
+   {7, 512},
+   {8, 768},
+   {9, 1024},
+   {10, 1536},
+   {11, 2048},
+   {12, 3072},
+   {13, 4096},
+   {14, 8192},
+   {15, 16384},
+   {-1, -1},
+};
+
 /**
  * get_capability - convert device capabilities to NIC capabilities
  * @hwdev: the HW device to set and convert device capabilities for
@@ -243,6 +284,250 @@ int hinic_port_msg_cmd(struct hinic_hwdev *hwdev, enum 
hinic_port_cmd cmd,
 }
 
 /**
+ * init_fw_ctxt- Init Firmware tables before network mgmt and io operations
+ * @hwdev: the NIC HW device
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int init_fw_ctxt(struct hinic_hwdev *hwdev)
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_cmd_fw_ctxt fw_ctxt;
+   u16 out_size;
+   int err;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
+   pr_err("Unsupported PCI Function type\n");
+   return -EINVAL;
+   }
+
+   fw_ctxt.func_idx = HINIC_HWIF_GLOB_IDX(hwif);
+   fw_ctxt.rx_buf_sz = HINIC_RX_BUF_SZ;
+
+   pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+   err = hinic_port_msg_cmd(hwdev, HINIC_PORT_CMD_FWCTXT_INIT,
+_ctxt, sizeof(fw_ctxt),
+_ctxt, _size);
+   if (err || 

[PATCH net 16/20] net/hinic: Add cmdq commands

2017-07-12 Thread Aviad Krawczyk
Add cmdq commands for setting queue pair contexts in the nic.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_common.c  |  25 ++
 drivers/net/ethernet/huawei/hinic/hinic_common.h  |  15 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 288 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h | 149 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_io.h   |   5 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c   | 193 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h   |   8 +
 7 files changed, 681 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_common.c 
b/drivers/net/ethernet/huawei/hinic/hinic_common.c
index 3b439e9..07d264c 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_common.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_common.c
@@ -13,6 +13,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 
@@ -53,3 +54,27 @@ void hinic_be32_to_cpu(void *data, int len)
mem++;
}
 }
+
+/**
+ * hinic_set_sge - set dma area in scatter gather entry
+ * @sge: scatter gather entry
+ * @addr: dma address
+ * @len: length of relevant data in the dma address
+ **/
+void hinic_set_sge(struct hinic_sge *sge, dma_addr_t addr, int len)
+{
+   sge->hi_addr = upper_32_bits(addr);
+   sge->lo_addr = lower_32_bits(addr);
+   sge->len  = len;
+}
+
+/**
+ * hinic_sge_to_dma - get dma address from scatter gather entry
+ * @sge: scatter gather entry
+ *
+ * Return dma address of sg entry
+ **/
+dma_addr_t hinic_sge_to_dma(struct hinic_sge *sge)
+{
+   return (dma_addr_t)u64)sge->hi_addr) << 32) | sge->lo_addr);
+}
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_common.h 
b/drivers/net/ethernet/huawei/hinic/hinic_common.h
index 21921ec..1362fd0 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_common.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_common.h
@@ -16,8 +16,23 @@
 #ifndef HINIC_COMMON_H
 #define HINIC_COMMON_H
 
+#include 
+
+#define UPPER_8_BITS(data) (((data) >> 8) & 0xFF)
+#define LOWER_8_BITS(data) ((data) & 0xFF)
+
+struct hinic_sge {
+   u32 hi_addr;
+   u32 lo_addr;
+   u32 len;
+};
+
 void hinic_cpu_to_be32(void *data, int len);
 
 void hinic_be32_to_cpu(void *data, int len);
 
+void hinic_set_sge(struct hinic_sge *sge, dma_addr_t addr, int len);
+
+dma_addr_t hinic_sge_to_dma(struct hinic_sge *sge);
+
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
index 30be1db..4427de6 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c
@@ -26,8 +26,12 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
+#include 
 
+#include "hinic_common.h"
 #include "hinic_hw_if.h"
 #include "hinic_hw_eqs.h"
 #include "hinic_hw_mgmt.h"
@@ -36,9 +40,18 @@
 #include "hinic_hw_io.h"
 #include "hinic_hw_dev.h"
 
+#define CMDQ_DB_PI_OFF(pi) (((u16)LOWER_8_BITS(pi)) << 3)
+
+#define CMDQ_DB_ADDR(db_base, pi)  ((db_base) + CMDQ_DB_PI_OFF(pi))
+
+#define CMDQ_WQE_HEADER(wqe)   ((struct hinic_cmdq_header *)(wqe))
+
+#define FIRST_DATA_TO_WRITE_LAST   sizeof(u64)
+
 #define CMDQ_DB_OFFSZ_2K
 
 #define CMDQ_WQEBB_SIZE64
+#define CMDQ_WQE_SIZE  64
 #defineCMDQ_DEPTH  SZ_4K
 
 #define CMDQ_WQ_PAGE_SIZE  SZ_4K
@@ -46,6 +59,10 @@
 #define WQE_LCMD_SIZE  64
 #define WQE_SCMD_SIZE  64
 
+#define COMPLETE_LEN   3
+
+#define CMDQ_TIMEOUT   1000
+
 #define CMDQ_PFN(addr, page_size)  ((addr) >> (ilog2(page_size)))
 
 #define cmdq_to_cmdqs(cmdq)container_of((cmdq) - (cmdq)->cmdq_type, \
@@ -60,6 +77,44 @@ enum cmdq_wqe_type {
WQE_SCMD_TYPE,
 };
 
+enum cmdq_path {
+   CTRL_PATH = 1,
+};
+
+enum completion_format {
+   COMPLETE_DIRECT,
+   COMPLETE_SGE,
+};
+
+enum data_format {
+   DATA_SGE,
+   DATA_DIRECT,
+};
+
+enum bufdesc_len {
+   BUFDESC_LCMD_LEN = 2,   /* 16 bytes - 2(8 byte unit) */
+   BUFDESC_SCMD_LEN = 3,   /* 24 bytes - 3(8 byte unit) */
+};
+
+enum ctrl_sect_len {
+   CTRL_SECT_LEN = 1, /* 4 bytes (ctrl) - 1(8 byte unit) */
+   CTRL_DIRECT_SECT_LEN = 2, /* 12 bytes (ctrl + rsvd) - 2(8 byte unit) */
+};
+
+enum cmdq_scmd_type {
+   CMDQ_SET_ARM_CMD = 2,
+};
+
+enum cmdq_cmd_type {
+   CMDQ_CMD_SYNC_DIRECT_RESP,
+   CMDQ_CMD_SYNC_SGE_RESP,
+};
+
+enum completion_request {
+   NO_CEQ,
+   CEQ_SET,
+};
+
 /**
  * hinic_alloc_cmdq_buf - alloc buffer for sending command
  * @cmdqs: the cmdqs
@@ -94,6 +149,228 @@ void hinic_free_cmdq_buf(struct hinic_cmdqs *cmdqs,
pci_pool_free(cmdqs->cmdq_buf_pool, cmdq_buf->buf, cmdq_buf->dma_addr);
 }
 

[PATCH net 19/20] net/hinic: Add Tx operation

2017-07-12 Thread Aviad Krawczyk
Add transmit operation for sending data by qp operations.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/hinic_dev.h |   1 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c  |  46 +++
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h  |  22 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h |   2 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c   | 249 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h   | 197 ++
 drivers/net/ethernet/huawei/hinic/hinic_main.c|  10 +-
 drivers/net/ethernet/huawei/hinic/hinic_tx.c  | 415 ++
 drivers/net/ethernet/huawei/hinic/hinic_tx.h  |  11 +
 9 files changed, 948 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
index e9273db..f59c90d 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_dev.h
@@ -44,6 +44,7 @@ struct hinic_dev {
struct hinic_hwdev  *hwdev;
 
u32 msg_enable;
+   unsigned inttx_weight;
unsigned intrx_weight;
 
unsigned intflags;
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 9b667b0..9bc9e77 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -41,6 +41,8 @@
 #define MAX_IRQS(max_qps, num_aeqs, num_ceqs)  \
 (2 * (max_qps) + (num_aeqs) + (num_ceqs))
 
+#define ADDR_IN_4BYTES(addr)   ((addr) >> 2)
+
 enum intr_type {
INTR_MSIX_TYPE,
 };
@@ -1017,3 +1019,47 @@ int hinic_hwdev_msix_set(struct hinic_hwdev *hwdev, u16 
msix_index,
   lli_timer_cfg, lli_credit_limit,
   resend_timer);
 }
+
+/**
+ * hinic_hwdev_hw_ci_addr_set - set cons idx addr and attributes in HW for sq
+ * @hwdev: the NIC HW device
+ * @sq: send queue
+ * @pending_limit: the maximum pending update ci events (unit 8)
+ * @coalesc_timer: coalesc period for update ci (unit 8 us)
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+int hinic_hwdev_hw_ci_addr_set(struct hinic_hwdev *hwdev, struct hinic_sq *sq,
+  u8 pending_limit, u8 coalesc_timer)
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct hinic_qp *qp = container_of(sq, struct hinic_qp, sq);
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_cmd_hw_ci hw_ci;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
+   pr_err("Unsupported PCI Function type\n");
+   return -EINVAL;
+   }
+
+   hw_ci.dma_attr_off  = 0;
+   hw_ci.pending_limit = pending_limit;
+   hw_ci.coalesc_timer  = coalesc_timer;
+
+   hw_ci.msix_en = 1;
+   hw_ci.msix_entry_idx = sq->msix_entry;
+
+   hw_ci.func_idx = HINIC_HWIF_GLOB_IDX(hwif);
+
+   hw_ci.sq_id = qp->q_id;
+
+   hw_ci.ci_addr = ADDR_IN_4BYTES(sq->hw_ci_dma_addr);
+
+   pfhwdev = container_of(hwdev, struct hinic_pfhwdev, hwdev);
+
+   return hinic_msg_to_mgmt(>pf_to_mgmt,
+HINIC_MOD_COMM,
+HINIC_COMM_CMD_SQ_HI_CI_SET,
+_ci, sizeof(hw_ci), NULL,
+NULL, HINIC_MGMT_MSG_SYNC);
+}
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
index de5e9eb..2970e5e 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
@@ -153,6 +153,25 @@ struct hinic_cmd_base_qpn {
u16 qpn;
 };
 
+struct hinic_cmd_hw_ci {
+   u8  status;
+   u8  version;
+   u8  rsvd0[6];
+
+   u16 func_idx;
+
+   u8  dma_attr_off;
+   u8  pending_limit;
+   u8  coalesc_timer;
+
+   u8  msix_en;
+   u16 msix_entry_idx;
+
+   u32 sq_id;
+   u32 rsvd1;
+   u64 ci_addr;
+};
+
 struct hinic_hwdev {
struct hinic_hwif   *hwif;
struct msix_entry   *msix_entries;
@@ -214,4 +233,7 @@ int hinic_hwdev_msix_set(struct hinic_hwdev *hwdev, u16 
msix_index,
 u8 lli_timer_cfg, u8 lli_credit_limit,
 u8 resend_timer);
 
+int hinic_hwdev_hw_ci_addr_set(struct hinic_hwdev *hwdev, struct hinic_sq *sq,
+  u8 pending_limit, u8 coalesc_timer);
+
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
index 61f3b6f..cef56f1 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h
@@ -77,6 +77,8 @@ 

[PATCH net 03/20] net/hinic: Initialize api cmd resources

2017-07-12 Thread Aviad Krawczyk
Initialize api cmd resources as part of the management initialization.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile |   4 +-
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c   | 458 +
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h   | 102 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c  |  11 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h  |   3 +
 5 files changed, 576 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index d080dfb..88223d0 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
-hinic-y := hinic_main.o hinic_hw_dev.o hinic_hw_mgmt.o hinic_hw_eqs.o \
-  hinic_hw_if.o
\ No newline at end of file
+hinic-y := hinic_main.o hinic_hw_dev.o hinic_hw_mgmt.o hinic_hw_api_cmd.o \
+  hinic_hw_eqs.o hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
new file mode 100644
index 000..dc80fa7
--- /dev/null
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
@@ -0,0 +1,458 @@
+/*
+ * Huawei HiNIC PCI Express Linux driver
+ * Copyright(c) 2017 Huawei Technologies Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hinic_hw_if.h"
+#include "hinic_hw_api_cmd.h"
+
+#define API_CHAIN_NUM_CELLS32
+
+#define API_CMD_CELL_SIZE_SHIFT6
+#define API_CMD_CELL_SIZE_MIN  (BIT(API_CMD_CELL_SIZE_SHIFT))
+
+#define API_CMD_CELL_SIZE(cell_size)   \
+   (((cell_size) >= API_CMD_CELL_SIZE_MIN) ? \
+(1 << (fls(cell_size - 1))) : API_CMD_CELL_SIZE_MIN)
+
+#define API_CMD_BUF_SIZE   2048
+
+/**
+ * api_cmd_chain_hw_init - initialize the chain in the HW
+ * @chain: the API CMD specific chain to initialize in HW
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int api_cmd_chain_hw_init(struct hinic_api_cmd_chain *chain)
+{
+   /* should be implemented */
+   return 0;
+}
+
+/**
+ * free_cmd_buf - free the dma buffer of API CMD command
+ * @chain: the API CMD specific chain of the cmd
+ * @cell_idx: the cell index of the cmd
+ **/
+static void free_cmd_buf(struct hinic_api_cmd_chain *chain, int cell_idx)
+{
+   struct hinic_api_cmd_cell_ctxt *cell_ctxt;
+   struct hinic_hwif *hwif = chain->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+
+   cell_ctxt = >cell_ctxt[cell_idx];
+
+   dma_free_coherent(>dev, API_CMD_BUF_SIZE,
+ cell_ctxt->api_cmd_vaddr,
+ cell_ctxt->api_cmd_paddr);
+}
+
+/**
+ * alloc_cmd_buf - allocate a dma buffer for API CMD command
+ * @chain: the API CMD specific chain for the cmd
+ * @cell: the cell in the HW for the cmd
+ * @cell_idx: the index of the cell
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int alloc_cmd_buf(struct hinic_api_cmd_chain *chain,
+struct hinic_api_cmd_cell *cell, int cell_idx)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   struct hinic_api_cmd_cell_ctxt *cell_ctxt;
+   dma_addr_t cmd_paddr;
+   void *cmd_vaddr;
+   int err = 0;
+
+   cmd_vaddr = dma_zalloc_coherent(>dev, API_CMD_BUF_SIZE,
+   _paddr, GFP_KERNEL);
+   if (!cmd_vaddr) {
+   dev_err(>dev, "Failed to allocate API CMD DMA memory\n");
+   return -ENOMEM;
+   }
+
+   cell_ctxt = >cell_ctxt[cell_idx];
+
+   cell_ctxt->api_cmd_vaddr = cmd_vaddr;
+   cell_ctxt->api_cmd_paddr = cmd_paddr;
+
+   /* set the cmd DMA address in the cell */
+   switch (chain->chain_type) {
+   case HINIC_API_CMD_WRITE_TO_MGMT_CPU:
+   /* The data in the HW should be in Big Endian Format */
+   cell->write.hw_cmd_paddr = cpu_to_be64(cmd_paddr);
+   break;
+
+   default:
+   pr_err("Unsupported API CMD chain type\n");
+  

[PATCH net 06/20] net/hinic: Add api cmd commands

2017-07-12 Thread Aviad Krawczyk
Add the api cmd commands for sending management messages to the nic.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c   | 329 -
 .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h   |  65 
 drivers/net/ethernet/huawei/hinic/hinic_hw_csr.h   |   7 +
 3 files changed, 399 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
index 5d32c91..807c576 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
@@ -26,7 +26,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #include "hinic_hw_csr.h"
 #include "hinic_hw_if.h"
@@ -46,14 +48,312 @@
 
 #define API_CMD_BUF_SIZE   2048
 
+/* Sizes of the members in hinic_api_cmd_cell */
+#define API_CMD_CELL_DESC_SIZE 8
+#define API_CMD_CELL_DATA_ADDR_SIZE8
+
+#define API_CMD_CELL_ALIGNMENT 8
+
 #define API_CMD_TIMEOUT1000
 
+#define MASKED_IDX(chain, idx) ((idx) & ((chain)->num_cells - 1))
+
+#define SIZE_8BYTES(size)  (ALIGN((size), 8) >> 3)
+#define SIZE_4BYTES(size)  (ALIGN((size), 4) >> 2)
+
+#define RD_DMA_ATTR_DEFAULT0
+#define WR_DMA_ATTR_DEFAULT0
+
+enum api_cmd_data_format {
+   SGE_DATA = 1,   /* cell data is passed by hw address */
+};
+
+enum api_cmd_type {
+   API_CMD_WRITE = 0,
+};
+
+enum api_cmd_bypass {
+   NO_BYPASS = 0,
+   BYPASS = 1,
+};
+
 enum api_cmd_xor_chk_level {
XOR_CHK_DIS = 0,
 
XOR_CHK_ALL = 3,
 };
 
+static u8 xor_chksum_set(void *data)
+{
+   int idx;
+   u8 *val, checksum = 0;
+
+   val = data;
+
+   for (idx = 0; idx < 7; idx++)
+   checksum ^= val[idx];
+
+   return checksum;
+}
+
+static void set_prod_idx(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   enum hinic_api_cmd_chain_type chain_type = chain->chain_type;
+   u32 hw_prod_idx_addr = HINIC_CSR_API_CMD_CHAIN_PI_ADDR(chain_type);
+   u32 prod_idx;
+
+   prod_idx = hinic_hwif_read_reg(hwif, hw_prod_idx_addr);
+
+   prod_idx = HINIC_API_CMD_PI_CLEAR(prod_idx, IDX);
+
+   prod_idx |= HINIC_API_CMD_PI_SET(chain->prod_idx, IDX);
+
+   hinic_hwif_write_reg(hwif, hw_prod_idx_addr, prod_idx);
+}
+
+static u32 get_hw_cons_idx(struct hinic_api_cmd_chain *chain)
+{
+   u32 addr, val;
+
+   addr = HINIC_CSR_API_CMD_STATUS_ADDR(chain->chain_type);
+   val  = hinic_hwif_read_reg(chain->hwif, addr);
+
+   return HINIC_API_CMD_STATUS_GET(val, CONS_IDX);
+}
+
+/**
+ * chain_busy - check if the chain is still processing last requests
+ * @chain: chain to check
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int chain_busy(struct hinic_api_cmd_chain *chain)
+{
+   struct hinic_hwif *hwif = chain->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   u32 prod_idx;
+
+   switch (chain->chain_type) {
+   case HINIC_API_CMD_WRITE_TO_MGMT_CPU:
+   chain->cons_idx = get_hw_cons_idx(chain);
+   prod_idx = chain->prod_idx;
+
+   /* check for a space for a new command */
+   if (chain->cons_idx == MASKED_IDX(chain, prod_idx + 1)) {
+   dev_err(>dev, "API CMD chain %d is busy\n",
+   chain->chain_type);
+   return -EBUSY;
+   }
+   break;
+
+   default:
+   pr_err("Unknown API CMD Chain type\n");
+   break;
+   }
+
+   return 0;
+}
+
+/**
+ * get_cell_data_size - get the data size of a specific cell type
+ * @type: chain type
+ *
+ * Return the data(Desc + Address) size in the cell
+ **/
+static u8 get_cell_data_size(enum hinic_api_cmd_chain_type type)
+{
+   u8 cell_data_size = 0;
+
+   switch (type) {
+   case HINIC_API_CMD_WRITE_TO_MGMT_CPU:
+   cell_data_size = ALIGN(API_CMD_CELL_DESC_SIZE +
+  API_CMD_CELL_DATA_ADDR_SIZE,
+  API_CMD_CELL_ALIGNMENT);
+   break;
+   default:
+   break;
+   }
+
+   return cell_data_size;
+}
+
+/**
+ * prepare_cell_ctrl - prepare the ctrl of the cell for the command
+ * @cell_ctrl: the control of the cell to set the control value into it
+ * @data_size: the size of the data in the cell
+ **/
+static void prepare_cell_ctrl(u64 *cell_ctrl, u16 data_size)
+{
+   u64 ctrl;
+   u8 chksum;
+
+   ctrl =  HINIC_API_CMD_CELL_CTRL_SET(SIZE_8BYTES(data_size), DATA_SZ) |
+   HINIC_API_CMD_CELL_CTRL_SET(RD_DMA_ATTR_DEFAULT, RD_DMA_ATTR) |
+   HINIC_API_CMD_CELL_CTRL_SET(WR_DMA_ATTR_DEFAULT, WR_DMA_ATTR);
+
+   chksum = 

[PATCH net 02/20] nic/hinic: Initialize hw device components

2017-07-12 Thread Aviad Krawczyk
Initialize hw device by calling the initialization functions of aeqs and
management channel.

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhaochen 
---
 drivers/net/ethernet/huawei/hinic/Makefile|   3 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c  | 177 --
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h  |  14 +-
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c  | 149 ++
 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h  | 107 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_if.h   |   8 +
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c |  93 
 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h |  45 ++
 8 files changed, 581 insertions(+), 15 deletions(-)
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c
 create mode 100644 drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.h

diff --git a/drivers/net/ethernet/huawei/hinic/Makefile 
b/drivers/net/ethernet/huawei/hinic/Makefile
index 353cee0..d080dfb 100644
--- a/drivers/net/ethernet/huawei/hinic/Makefile
+++ b/drivers/net/ethernet/huawei/hinic/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_HINIC) += hinic.o
 
-hinic-y := hinic_main.o hinic_hw_dev.o hinic_hw_if.o
+hinic-y := hinic_main.o hinic_hw_dev.o hinic_hw_mgmt.o hinic_hw_eqs.o \
+  hinic_hw_if.o
\ No newline at end of file
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 8df02ec..ad253c7 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -13,6 +13,8 @@
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -22,11 +24,135 @@
 #include 
 
 #include "hinic_hw_if.h"
+#include "hinic_hw_eqs.h"
+#include "hinic_hw_mgmt.h"
 #include "hinic_hw_dev.h"
 
 #define MAX_IRQS(max_qps, num_aeqs, num_ceqs)  \
 (2 * (max_qps) + (num_aeqs) + (num_ceqs))
 
+enum intr_type {
+   INTR_MSIX_TYPE,
+};
+
+/* HW struct */
+struct hinic_dev_cap {
+   u8  status;
+   u8  version;
+   u8  rsvd0[6];
+
+   u8  rsvd1[5];
+   u8  intr_type;
+   u8  rsvd2[66];
+   u16 max_sqs;
+   u16 max_rqs;
+   u8  rsvd3[208];
+};
+
+/**
+ * get_capability - convert device capabilities to NIC capabilities
+ * @hwdev: the HW device to set and convert device capabilities for
+ * @dev_cap: device capabilities from FW
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int get_capability(struct hinic_hwdev *hwdev,
+ struct hinic_dev_cap *dev_cap)
+{
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct hinic_cap *nic_cap = >nic_cap;
+   int num_aeqs, num_ceqs, num_irqs, num_qps;
+
+   if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif))
+   return -EINVAL;
+
+   if (dev_cap->intr_type != INTR_MSIX_TYPE)
+   return -EFAULT;
+
+   num_aeqs = HINIC_HWIF_NUM_AEQS(hwif);
+   num_ceqs = HINIC_HWIF_NUM_CEQS(hwif);
+   num_irqs = HINIC_HWIF_NUM_IRQS(hwif);
+
+   /* Each QP has its own (SQ + RQ) interrupts */
+   num_qps = (num_irqs - (num_aeqs + num_ceqs)) / 2;
+
+   /* num_qps must be power of 2 */
+   num_qps = BIT(fls(num_qps) - 1);
+
+   nic_cap->max_qps = dev_cap->max_sqs + 1;
+   if (nic_cap->max_qps != (dev_cap->max_rqs + 1))
+   return -EFAULT;
+
+   if (num_qps < nic_cap->max_qps)
+   nic_cap->num_qps = num_qps;
+   else
+   nic_cap->num_qps = nic_cap->max_qps;
+
+   return 0;
+}
+
+/**
+ * get_cap_from_fw - get device capabilities from FW
+ * @pfhwdev: the PF HW device to get capabilities for
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int get_cap_from_fw(struct hinic_pfhwdev *pfhwdev)
+{
+   struct hinic_hwdev *hwdev = >hwdev;
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct pci_dev *pdev = hwif->pdev;
+   struct hinic_dev_cap dev_cap;
+   u16 in_len, out_len;
+   int err;
+
+   in_len = 0;
+   out_len = sizeof(dev_cap);
+
+   err = hinic_msg_to_mgmt(>pf_to_mgmt, HINIC_MOD_CFGM,
+   HINIC_CFG_NIC_CAP, _cap, in_len, _cap,
+   _len, HINIC_MGMT_MSG_SYNC);
+   if (err) {
+   dev_err(>dev, "Failed to get capability from FW\n");
+   return err;
+   }
+
+   return get_capability(hwdev, _cap);
+}
+
+/**
+ * get_dev_cap - get device capabilities
+ * @hwdev: the NIC HW device to get capabilities for
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+static int get_dev_cap(struct hinic_hwdev *hwdev)
+{
+   struct hinic_pfhwdev *pfhwdev;
+   struct hinic_hwif *hwif = hwdev->hwif;
+   struct pci_dev 

Re: [Intel-wired-lan] [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio

2017-07-12 Thread kbuild test robot
Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git 
dev-queue
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `i40e_set_bw_limit':
   (.text+0x11aabd9): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `i40e_rebuild':
>> i40e_main.c:(.text+0x11af884): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `__i40e_setup_tc':
   i40e_main.c:(.text+0x11b0cb5): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0e99): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0f25): undefined reference to `__udivdi3'
   drivers/built-in.o:(.text+0x1297fb8): more undefined references to 
`__udivdi3' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: Lots of new warnings with gcc-7.1.1

2017-07-12 Thread Arnd Bergmann
On Wed, Jul 12, 2017 at 3:10 PM, Greg Kroah-Hartman
 wrote:
> On Tue, Jul 11, 2017 at 03:35:15PM -0700, Linus Torvalds wrote:
>> [ Very random list of maintainers and mailing lists, at least
>> partially by number of warnings generated by gcc-7.1.1 that is then
>> correlated with the get_maintainers script ]
>>
>> So I upgraded one of my boxes to F26, which upgraded the compiler to 
>> gcc-7.1.1
>>
>> Which in turn means that my nice clean allmodconfig compile is not an
>> unholy mess of annoying new warnings.
>
> I asked Arnd about this the other day on IRC as I've hit this as well on
> the stable releases, and it's really annoying.  He mentioned that he had
> lots of these warnings fixed, but didn't push most of the changes out
> yet.

To clarify: most of the patches I wrote ended up getting merged, but
there were a couple that I did not submit a second time after they
got dropped, but I gave up on trying to fix the new -Wformat warnings
and simply disabled them, hoping someone else would do it before me,
or that the gcc developers would find a way to reduce the false-positive
ones before the release.

>  Arnd, any repo with them in it that we could look at?

I have a private tree on my workstation that has lots of random
crap, and I rebase it all the time but normally don't publish it.

I have uploaded today's snapshot to

git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git randconfig-4.13-next

The way I work with this is helpful to catch build regressions as soon
as they happen, but not so good in finding things that I have either
submitted a patch for before and don't remember if it should be
resubmitted, or stuff that I decided I didn't want to deal with at some
point.

I was already planning to start over from scratch one of these days,
and cherry-pick+resubmit the patches that are actually required
for randconfig builds.

>> Anyway, it would be lovely if some of the more affected developers
>> would take a look at gcc-7.1.1 warnings. Right now I get about three
>> *thousand* lines of warnings from a "make allmodconfig" build, which
>> makes them a bit overwhelming.
>
> I only have 310 when building the 4.12.0 release with 7.1.1, I wonder if
> Fedora turned more warnings on in their compiler release, I'm running
> Arch here:
> $ gcc --version
> gcc (GCC) 7.1.1 20170621

This is what I get in today's linux-next:

$ grep error: 4.13-next-allmod-warning | sed -e 's:^.*\[-W:-W:' | sort
| uniq -c | cut -f 1 -d\] | sort -n
  1 -Werror=parentheses
  2 -Werror=tautological-compare
  2 -Werror=unused-result
 34 -Werror=format-overflow=
 41 -Werror=int-in-bool-context
233 -Werror=format-truncation=

I'll resubmit the patches for -Wparenthese, -Wtautological-compar,
-Wunused-result and -Wint-in-bool-context that I had sent earlier,
plus a new patch to move -Wformat-truncation into W=1.

  Arnd


RE: [PATCH] net: broadcom: bnx2x: make a couple of const arrays static

2017-07-12 Thread Mintz, Yuval
> Don't populate various tables on the stack but make them static const.
> Makes the object code smaller by nearly 200 bytes:
> 
> Before:
>text  data bss dec hex filename
>  113468 11200   0  124668   1e6fc bnx2x_ethtool.o
> 
> After:
>text  data bss dec hex filename
>  113129 11344   0  124473   1e639 bnx2x_ethtool.o
> 
> Signed-off-by: Colin Ian King 

Thanks Colin.
Acked-by: Yuval Mintz 


Re: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option

2017-07-12 Thread Andrew Lunn
> So would it be possible to add a "quick" bugfix patch (maybe this patch
> or another one removing the clk disable) so this fix can be backported
> to stable? Otherwise our board is only working with another
> "out-of-tree" patch (which I want to avoid)...

Hi Richard

It is a clear regression, so a minimal fix would be accepted to
stable.

Andrew


Re: [PATCH v1 net-next 1/5] drop_monitor: import netnamespace framework

2017-07-12 Thread Neil Horman
On Wed, Jul 12, 2017 at 06:40:49PM +0800, martinbj2...@gmail.com wrote:
> From: martin Zhang 
> 
> This is a serial patch for drop monitor, in order to support net namespace.
> 
> Import two struct to support net ns:
> 
> 1. struct per_ns_dm_cb:
>   Just like its name, it is used in per net ns.
> 
>   In this patch it is empty, but in following patch, these field will be 
> added.
>   a. trace_state: every net ns has a switch to indicate the trace state.
>   b. ns_dm_mutex: the mutex will only work and keep exclusive operatons in a 
> net ns.
>   c. hw_stats_list: monitor for NAPI of net device.
> 
> 2. ns_pcpu_dm_data
>It is used to replace per_cpu_dm_data under per net ns.
> 
>per_cpu_dm_data will only keep the dm_alert_work, and the other field
> will be moved to ns_pcpu_dm_data. They do same thing just like current
> code, and the only difference is under per net ns.
> 
>   Keep there is a work under percpu, to send alter netlink message.
> 
> Signed-off-by: martin Zhang 
> ---
> The dropwatch is a very useful tool to diagnose network problem,
> which give us greate help.
> Dropwatch could not work under container(net namespace).
> It is a pitty, so let it support net ns.
> 
Sorry, Im having a hard time wrapping my head around this.  Why exactly is it
that dropwatch won't work in a namespaced environment?  IIRC, the kfree
tracepoints are namespace agnostic, and so running dropwatch anywhere should
result in seeing drops in all namespaces.  I grant that perhaps it would be nice
to filter on a namespace, but it should all 'just work' for some definition of
the term, no?

Neil

>  net/core/drop_monitor.c | 41 +
>  1 file changed, 41 insertions(+)
> 
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index 70ccda2..6a75e04 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -32,6 +32,10 @@
>  #include 
>  
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #define TRACE_ON 1
>  #define TRACE_OFF 0
> @@ -41,6 +45,13 @@
>   * and the work handle that will send up
>   * netlink alerts
>   */
> +
> +struct ns_pcpu_dm_data {
> +};
> +
> +struct per_ns_dm_cb {
> +};
> +
>  static int trace_state = TRACE_OFF;
>  static DEFINE_MUTEX(trace_state_mutex);
>  
> @@ -59,6 +70,7 @@ struct dm_hw_stat_delta {
>   unsigned long last_drop_val;
>  };
>  
> +static int dm_net_id __read_mostly;
>  static struct genl_family net_drop_monitor_family;
>  
>  static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data);
> @@ -382,6 +394,33 @@ static int dropmon_net_event(struct notifier_block 
> *ev_block,
>   .notifier_call = dropmon_net_event
>  };
>  
> +static int __net_init dm_net_init(struct net *net)
> +{
> + struct per_ns_dm_cb *ns_dm_cb;
> +
> + ns_dm_cb = net_generic(net, dm_net_id);
> + if (!ns_dm_cb)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void __net_exit dm_net_exit(struct net *net)
> +{
> + struct per_ns_dm_cb *ns_dm_cb;
> +
> + ns_dm_cb = net_generic(net, dm_net_id);
> + if (!ns_dm_cb)
> + return;
> +}
> +
> +static struct pernet_operations dm_net_ops = {
> + .init = dm_net_init,
> + .exit = dm_net_exit,
> + .id   = _net_id,
> + .size = sizeof(struct per_ns_dm_cb),
> +};
> +
>  static int __init init_net_drop_monitor(void)
>  {
>   struct per_cpu_dm_data *data;
> @@ -393,6 +432,7 @@ static int __init init_net_drop_monitor(void)
>   pr_err("Unable to store program counters on this arch, Drop 
> monitor failed\n");
>   return -ENOSPC;
>   }
> + rc = register_pernet_subsys(_net_ops);
>  
>   rc = genl_register_family(_drop_monitor_family);
>   if (rc) {
> @@ -441,6 +481,7 @@ static void exit_net_drop_monitor(void)
>* or pending schedule calls
>*/
>  
> + unregister_pernet_subsys(_net_ops);
>   for_each_possible_cpu(cpu) {
>   data = _cpu(dm_cpu_data, cpu);
>   del_timer_sync(>send_timer);
> -- 
> 1.8.3.1
> 
> 


Re: Lots of new warnings with gcc-7.1.1

2017-07-12 Thread Arnd Bergmann
On Wed, Jul 12, 2017 at 5:41 AM, Linus Torvalds
 wrote:

>
> We also have about a bazillion
>
> warning: ‘*’ in boolean context, suggest ‘&&’ instead
>
> warnings in drivers/ata/libata-core.c, all due to a single macro that
> uses a pattern that gcc-7.1.1 doesn't like. The warning looks a bit
> debatable, but I suspect the macro could easily be changed too.
>
> Tejun, would you hate just moving the "multiply by 1000" part _into_
> that EZ() macro? Something like the attached (UNTESTED!) patch?

Tejun applied an almost identical patch of mine a while ago, but it seems to
have gotten lost in the meantime in some rebase:

https://patchwork.kernel.org/patch/9721397/
https://patchwork.kernel.org/patch/9721399/

I guess I should have resubmitted the second patch with the suggested
improvement.

 Arnd


Re: Lots of new warnings with gcc-7.1.1

2017-07-12 Thread Greg Kroah-Hartman
On Tue, Jul 11, 2017 at 03:35:15PM -0700, Linus Torvalds wrote:
> [ Very random list of maintainers and mailing lists, at least
> partially by number of warnings generated by gcc-7.1.1 that is then
> correlated with the get_maintainers script ]
> 
> So I upgraded one of my boxes to F26, which upgraded the compiler to gcc-7.1.1
> 
> Which in turn means that my nice clean allmodconfig compile is not an
> unholy mess of annoying new warnings.

I asked Arnd about this the other day on IRC as I've hit this as well on
the stable releases, and it's really annoying.  He mentioned that he had
lots of these warnings fixed, but didn't push most of the changes out
yet.  Arnd, any repo with them in it that we could look at?

> Normally I hate the stupid new warnings, but this time around they are
> actually exactly the kinds of warnings you'd want to see and that are
> hard for humans to pick out errors: lots of format errors wrt limited
> buffer sizes.
> 
> At the same time, many of them *are* annoying. We have various limited
> buffers that are limited for a good reason, and some of the format
> truncation warnings are about numbers in the range {0-MAX_INT], where
> we definitely know that we don't need to worry about the really big
> ones.
> 
> After all, we're using "snprintf()" for a reason - we *want* to
> truncate if the buffer is too small.

Yeah, that's the warnings in the USB core code, we "know" this will not
happen, and we are using snprintf() for that reason as well, I don't
know how to fool gcc into the fact that it's all ok here.

> Anyway, it would be lovely if some of the more affected developers
> would take a look at gcc-7.1.1 warnings. Right now I get about three
> *thousand* lines of warnings from a "make allmodconfig" build, which
> makes them a bit overwhelming.

I only have 310 when building the 4.12.0 release with 7.1.1, I wonder if
Fedora turned more warnings on in their compiler release, I'm running
Arch here:
$ gcc --version
gcc (GCC) 7.1.1 20170621

thanks,

greg k-h


Re: [iproute PATCH] ip netns: Make sure netns name is sane

2017-07-12 Thread Phil Sutter
On Mon, Jul 10, 2017 at 08:17:02AM -0700, Stephen Hemminger wrote:
> On Mon, 10 Jul 2017 13:19:12 +0200
> Phil Sutter  wrote:
> 
> > +static bool is_basename(const char *name)
> > +{
> > +   char *name_dup = strdup(name);
> > +   bool rc = true;
> > +
> > +   if (!name_dup)
> > +   return false;
> > +
> > +   if (strcmp(basename(name_dup), name))
> > +   rc = false;
> > +
> > +   free(name_dup);
> > +   return rc;
> > +}
> 
> Why not just:
> 
> static bool is_basename(const char *name)
> {
>   return strchr(name '/') == NULL;
> }

This is not sufficient since it doesn't cover netns names of '..' and
'.', as Matteo correctly pointed out.

Cheers, Phil


Re: Lots of new warnings with gcc-7.1.1

2017-07-12 Thread Mauro Carvalho Chehab
Em Tue, 11 Jul 2017 15:35:15 -0700
Linus Torvalds  escreveu:

> [ Very random list of maintainers and mailing lists, at least
> partially by number of warnings generated by gcc-7.1.1 that is then
> correlated with the get_maintainers script ]

Under drivers/media, I fixed a bunch of gcc 7.1 warnings before the
merge window. While most were just noise, some actually pointed to
human errors.

Now, gcc-7.1.1 produces only 6 warnings with W=1 on x86_64 (allyesconfig), 
either due to unused-but-set-variable or unused-const-variable. I guess
both warning options are disabled by default. Anyway, I have patches
to fix them already. I'll send you later.

The atomisp staging driver is a completely different beast, with would
produce itself a huge amount of warnings. I ended by adding some
logic on drivers/staging/media/atomisp/ Makefiles to disable them:

ccflags-y += $(call cc-disable-warning, missing-declarations)
ccflags-y += $(call cc-disable-warning, missing-prototypes)
ccflags-y += $(call cc-disable-warning, unused-but-set-variable)
ccflags-y += $(call cc-disable-warning, unused-const-variable)
ccflags-y += $(call cc-disable-warning, suggest-attribute=format)
ccflags-y += $(call cc-disable-warning, implicit-fallthrough)

(there's actually one patch pending related to atomisp, that I'll also
be sending you soon - meant to avoid warnings if compiled with an older
gcc version)

Thanks,
Mauro


Re: [PATCH V2] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

2017-07-12 Thread Arend van Spriel

On 7/7/2017 10:09 PM, Arend van Spriel wrote:

The lower level nl80211 code in cfg80211 ensures that "len" is between
25 and NL80211_ATTR_FRAME (2304).  We subtract DOT11_MGMT_HDR_LEN (24) from
"len" so thats's max of 2280.  However, the action_frame->data[] buffer is
only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy() can
overflow.

memcpy(action_frame->data, [DOT11_MGMT_HDR_LEN],
   le16_to_cpu(action_frame->len));

Cc: sta...@vger.kernel.org # 3.9.x
Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.")
Reported-by: "freenerguo(郭大兴)" 
Signed-off-by: Arend van Spriel 
---
  V2:
   - added Fixes: tag and Cc: for stable kernels.
   - Cc: patch to netdev list.
---
Hi David,

Here is the patch as Linus send it to us and secur...@kernel.org. I
removed the lower bound check as that is already done in cfg80211.
Now I signed off on the patch although formally I suppose Linus should
sign it off. Putting it out there so people can respond as deemed
necessary.

The reason for submitting it to your tree is the fact that Kalle is
on vacation for next 10 days or so which was indicated to me by Johannes.
The patch applies to the master branch of your net repository. For
reference V1 of this patch can be found here [1].


Hi Dave,

Not sure if you missed this one. It is addressing a reported security 
issue and intended for the net repository, not net-next which is 
obviously closed [2].


Regards,
Arend

[2] http://vger.kernel.org/~davem/net-next.html


Regards,
Arend

[1] https://patchwork.kernel.org/patch/9829977/
---
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index cd1d673..d182a00 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -4851,6 +4851,11 @@ static int brcmf_cfg80211_stop_ap(struct wiphy *wiphy, 
struct net_device *ndev)
cfg80211_mgmt_tx_status(wdev, *cookie, buf, len, true,
GFP_KERNEL);
} else if (ieee80211_is_action(mgmt->frame_control)) {
+   if (len > BRCMF_FIL_ACTION_FRAME_SIZE + DOT11_MGMT_HDR_LEN) {
+   brcmf_err("invalid action frame length\n");
+   err = -EINVAL;
+   goto exit;
+   }
af_params = kzalloc(sizeof(*af_params), GFP_KERNEL);
if (af_params == NULL) {
brcmf_err("unable to allocate frame\n");





Re: [PATCH net-next RFC 05/12] net: dsa: Add support for learning FDB through notification

2017-07-12 Thread Nikolay Aleksandrov
On 12/07/17 14:23, Arkadi Sharshevsky wrote:
> 
> 
> On 07/11/2017 06:05 PM, Nikolay Aleksandrov wrote:
>> On 11/07/17 13:26, Arkadi Sharshevsky wrote:
>>>
>>>
>>> On 07/10/2017 11:59 PM, Vivien Didelot wrote:
 Hi Arkadi,

 Arkadi Sharshevsky  writes:

>>> +   err = dsa_port_fdb_add(p->dp, fdb_info->addr, 
>>> fdb_info->vid);
>>> +   if (err) {
>>> +   netdev_dbg(dev, "fdb add failed err=%d\n", err);
>>> +   break;
>>> +   }
>>> +   call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
>>> +_info->info);
>>> +   break;
>>> +
>>> +   case SWITCHDEV_FDB_DEL_TO_DEVICE:
>>> +   fdb_info = _work->fdb_info;
>>> +   err = dsa_port_fdb_del(p->dp, fdb_info->addr, 
>>> fdb_info->vid);
>>> +   if (err)
>>> +   netdev_dbg(dev, "fdb del failed err=%d\n", err);
>>
>> OK I must have missed from the off-list discussion why we are not
>> calling the switchdev notifier here?
>
> We do not agree on it actually, that is why it was moved to the list.
> I think that delete should succeed, you should retry until succession.
>
> The deletion is done under spinlock in the bridge so you cannot block,
> thus delete cannot fail due to hardware failure. Calling it here doesn't
> make sense because the bridge probably already deleted this FDB.

 So as we discussed, the problem here is that if dsa_port_fdb_del fails
 for some probable reasons (MDIO timeout, weak GPIO lines, etc.), Linux
 bridge will delete the entry in software, dumping bridge fdb will show
 nothing, but the entry would still be programmed in hardware and the
 network can thus be inconsistent, unsupposedly switching frames.

 IMHO the correct way for bridge to use the notification chain is to make
 SWITCHDEV_FDB_DEL_TO_DEVICE symmetrical to SWITCHDEV_FDB_ADD_TO_DEVICE:
 if an entry has been marked as offloaded, bridge must mark the entry as
 to-be-deleted and do not delete the software entry until the driver
 notifies back the successful deletion.

 If that is hardly feasible due to some bridge limitations, we must
 explain this in a comment and use something more explosive than a simple
 netdev_dbg to warn the user about the broken network setup...


 Thanks,

 Vivien

>>>
>>> Hi Nikolay,
>>>
>>> Vivien raised inconsistency issue with the current switchdev
>>> notification chain in case of FDB del. In case of static FDB delete,
>>> notification will be sent to the driver, followed by deletion of the
>>> software entry without waiting for the hardware delete. In case of
>>> hardware deletion failure the consecutive FDB dump will not show the
>>> deleted entry, yet, the entry will stay in hardware.
>>>
>>> The deletion is done under lock thus the hardware deletion is deferred,
>>> and cannot fail due to hardware removal failure. Thus the above proposed
>>> solution by Vivien can lead to confusing situation:
>>>
>>> 1. User deletes the entry
>>> 2. Deletion succeed
>>> 3. User dumps FDB and still sees this entry due to hardware failure,
>>>what should he do? retry to delete until the FDB dump will not show
>>>the entry?
>>>
>>> Would like to hear you opinion about this solution.
>>>
>>> IMHO in this case the driver should retry to delete, in case of
>>> several retries the driver should maybe:
>>> 1. Trap the traffic to CPU (dint think it possible in case of DSA).
>>> 2. Disable the port (its more explosive then netdev_dbg).
>>>
>>> Thanks,
>>> Arkadi
>>>
>>>
>>
>> Hi,
>> Looking at the code - it would seem that retrying is the only current option
>> with the way these switchdev notifications are handled. They cannot fail, 
>> meaning
>> from the bridge POV these ops must always succeed and errors are ignored, so 
>> the
>> driver should do everything possible to stay in sync, and in case all fails
>> then disabling the port seems like the best option to me, to show that 
>> something is
>> clearly wrong and avoid further issues, but DSA maintainers can comment more
>> on how to handle failure.
>>
>> That being said:
>> This sounds a lot like the switchdev notifications vs callbacks discussions 
>> that we've
>> had in the past. Also what happened with the prepare+commit and all that ? 
>> If the hash_lock
>> is the main problem let's work towards improving that and making the fdb 
>> code handle
>> switchdev similar to the vlan code.
>>
>> Cheers,
>>  Nik
>>
> 
> Vlans can be only added by user under rtnl so its possible to sleep. On
> the other hand FDBs can be learned in atomic context, thus the
> notification chain is atomic. One Possible way I thought about doing it

Right, that's why I mentioned the hash_lock. :-)

> is to 

Re: [PATCH net-next RFC 05/12] net: dsa: Add support for learning FDB through notification

2017-07-12 Thread Arkadi Sharshevsky


On 07/11/2017 06:05 PM, Nikolay Aleksandrov wrote:
> On 11/07/17 13:26, Arkadi Sharshevsky wrote:
>>
>>
>> On 07/10/2017 11:59 PM, Vivien Didelot wrote:
>>> Hi Arkadi,
>>>
>>> Arkadi Sharshevsky  writes:
>>>
>> +err = dsa_port_fdb_add(p->dp, fdb_info->addr, 
>> fdb_info->vid);
>> +if (err) {
>> +netdev_dbg(dev, "fdb add failed err=%d\n", err);
>> +break;
>> +}
>> +call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
>> + _info->info);
>> +break;
>> +
>> +case SWITCHDEV_FDB_DEL_TO_DEVICE:
>> +fdb_info = _work->fdb_info;
>> +err = dsa_port_fdb_del(p->dp, fdb_info->addr, 
>> fdb_info->vid);
>> +if (err)
>> +netdev_dbg(dev, "fdb del failed err=%d\n", err);
>
> OK I must have missed from the off-list discussion why we are not
> calling the switchdev notifier here?

 We do not agree on it actually, that is why it was moved to the list.
 I think that delete should succeed, you should retry until succession.

 The deletion is done under spinlock in the bridge so you cannot block,
 thus delete cannot fail due to hardware failure. Calling it here doesn't
 make sense because the bridge probably already deleted this FDB.
>>>
>>> So as we discussed, the problem here is that if dsa_port_fdb_del fails
>>> for some probable reasons (MDIO timeout, weak GPIO lines, etc.), Linux
>>> bridge will delete the entry in software, dumping bridge fdb will show
>>> nothing, but the entry would still be programmed in hardware and the
>>> network can thus be inconsistent, unsupposedly switching frames.
>>>
>>> IMHO the correct way for bridge to use the notification chain is to make
>>> SWITCHDEV_FDB_DEL_TO_DEVICE symmetrical to SWITCHDEV_FDB_ADD_TO_DEVICE:
>>> if an entry has been marked as offloaded, bridge must mark the entry as
>>> to-be-deleted and do not delete the software entry until the driver
>>> notifies back the successful deletion.
>>>
>>> If that is hardly feasible due to some bridge limitations, we must
>>> explain this in a comment and use something more explosive than a simple
>>> netdev_dbg to warn the user about the broken network setup...
>>>
>>>
>>> Thanks,
>>>
>>> Vivien
>>>
>>
>> Hi Nikolay,
>>
>> Vivien raised inconsistency issue with the current switchdev
>> notification chain in case of FDB del. In case of static FDB delete,
>> notification will be sent to the driver, followed by deletion of the
>> software entry without waiting for the hardware delete. In case of
>> hardware deletion failure the consecutive FDB dump will not show the
>> deleted entry, yet, the entry will stay in hardware.
>>
>> The deletion is done under lock thus the hardware deletion is deferred,
>> and cannot fail due to hardware removal failure. Thus the above proposed
>> solution by Vivien can lead to confusing situation:
>>
>> 1. User deletes the entry
>> 2. Deletion succeed
>> 3. User dumps FDB and still sees this entry due to hardware failure,
>>what should he do? retry to delete until the FDB dump will not show
>>the entry?
>>
>> Would like to hear you opinion about this solution.
>>
>> IMHO in this case the driver should retry to delete, in case of
>> several retries the driver should maybe:
>> 1. Trap the traffic to CPU (dint think it possible in case of DSA).
>> 2. Disable the port (its more explosive then netdev_dbg).
>>
>> Thanks,
>> Arkadi
>>
>>
> 
> Hi,
> Looking at the code - it would seem that retrying is the only current option
> with the way these switchdev notifications are handled. They cannot fail, 
> meaning
> from the bridge POV these ops must always succeed and errors are ignored, so 
> the
> driver should do everything possible to stay in sync, and in case all fails
> then disabling the port seems like the best option to me, to show that 
> something is
> clearly wrong and avoid further issues, but DSA maintainers can comment more
> on how to handle failure.
> 
> That being said:
> This sounds a lot like the switchdev notifications vs callbacks discussions 
> that we've
> had in the past. Also what happened with the prepare+commit and all that ? If 
> the hash_lock
> is the main problem let's work towards improving that and making the fdb code 
> handle
> switchdev similar to the vlan code.
> 
> Cheers,
>  Nik
> 

Vlans can be only added by user under rtnl so its possible to sleep. On
the other hand FDBs can be learned in atomic context, thus the
notification chain is atomic. One Possible way I thought about doing it
is to maintain bridge internal ordered workqueue for the FDBs learned in
atomic context, furthermore the FDB table will be protected by mutex.
The STATIC entries are added in user context so the user will add

Re: [RFC PATCH 03/12] xdp: add bpf_redirect helper function

2017-07-12 Thread Saeed Mahameed



On 7/11/2017 10:38 PM, Jesper Dangaard Brouer wrote:

On Tue, 11 Jul 2017 11:38:33 -0700
John Fastabend  wrote:


On 07/11/2017 07:09 AM, Andy Gospodarek wrote:

On Mon, Jul 10, 2017 at 1:23 PM, John Fastabend
 wrote:

On 07/09/2017 06:37 AM, Saeed Mahameed wrote:



On 7/7/2017 8:35 PM, John Fastabend wrote:

This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.

In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.

If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.

Signed-off-by: John Fastabend 
Acked-by: Daniel Borkmann 
---


[...]



+static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp)
+{
+if (dev->netdev_ops->ndo_xdp_xmit) {
+dev->netdev_ops->ndo_xdp_xmit(dev, xdp);


Hi John,

I have some concern here regarding synchronizing between the
redirecting device and the target device:

if the target device's NAPI is also doing XDP_TX on the same XDP TX
ring which this NDO might be redirecting xdp packets into the same
ring, there would be a race accessing this ring resources (buffers
and descriptors). Maybe you addressed this issue in the device driver
implementation of this ndo or with some NAPI tricks/assumptions, I
guess we have the same issue for if you run the same program to
redirect traffic from multiple netdevices into one netdevice, how do
you synchronize accessing this TX ring ?


The implementation uses a per cpu TX ring to resolve these races. And
the pair of driver interface API calls, xdp_do_redirect() and xdp_do_flush_map()
must be completed in a single poll() handler.

This comment was included in the header file to document this,

/* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
 * same cpu context. Further for best results no more than a single map
 * for the do_redirect/do_flush pair should be used. This limitation is
 * because we only track one map and force a flush when the map changes.
 * This does not appear to be a real limitation for existing software.
 */

In general some documentation about implementing XDP would probably be
useful to add in Documentation/networking but this IMO goes beyond just
this patch series.



Maybe we need some clear guidelines in this ndo documentation stating
how to implement this ndo and what are the assumptions on those XDP
TX redirect rings or from which context this ndo can run.

can you please elaborate.


I think the best implementation is to use a per cpu TX ring as I did in
this series. If your device is limited by the number of queues for some
reason some other scheme would need to be devised. Unfortunately, the only
thing I've come up for this case (using only this series) would both impact
performance and make the code complex.


mlx5 and mlx4 have no limitation in regards of number of queues, My only 
concern is that this looks like a very heavy assumption with some 
unwanted side effects.


is this per cpu TX ring made only for XDP_REDIRECT action ? or it is 
shared with the XDP_TX action coming from the same device ?


if yes, wouldn't there be a race on a preempt systems or while 
XDP_REDIRECT is taking action, a HW IRQ RX interrupt occurs on the 
target device, which might execute an XDP_TX action at the same time on 
the same ring ?




A nice solution might be to constrain networking "tasks" to only a subset
of cores. For 64+ core systems this might be a good idea. It would allow
avoiding locking using per_cpu logic but also avoid networking consuming
slices of every core in the system. As core count goes up I think we will
eventually need to address this.I believe Eric was thinking along these
lines with his netconf talk iirc. Obviously this work is way outside the
scope of this series though.


I agree that it is outside the scope of this series, but I think it is
important to consider the impact of the output queue selection in both
a heterogenous and homogenous driver setup and how tx could be
optimized or even considered to be more reliable and I think that was
part of Saeed's point.

I got base redirect support for bnxt_en working yesterday, but for it
and other drivers that do not necessarily create a ring/queue per core
like ixgbe there is probably a bit more to work in each driver to
properly track output tx rings/queues than what you have done with
ixgbe.



The problem, in my mind at least, is if you do not have a ring per core
how does the locking work? I don't see any good way to do this outside
of locking which I was trying to avoid.


I also agree this is outside the scope of the series, but i also tend to 
agree with Andy and Jesper, we 

[PATCH v1 net-next 4/5] drop_monitor: let drop stat support net ns

2017-07-12 Thread martinbj2008
From: martin Zhang 

move the detail drop stat to per net ns.
A net ns has its per cpu stat.

keep the work under per cpu to send netlink alter message.

all the net ns share a work under a CPU, the work will be scheduled
by any ns, and will send message in all the ns.

Signed-off-by: martin Zhang 
---
 net/core/drop_monitor.c | 123 +++-
 1 file changed, 91 insertions(+), 32 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 875e8b4..5828bf2 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -47,6 +47,10 @@
  */
 
 struct ns_pcpu_dm_data {
+   spinlock_t  lock;
+   struct sk_buff  *skb;
+   struct net  *net;
+   struct timer_list   send_timer;
 };
 
 /**
@@ -59,15 +63,13 @@ struct per_ns_dm_cb {
int trace_state;
struct mutex ns_dm_mutex;
struct list_head hw_stats_list;
+   struct ns_pcpu_dm_data __percpu *pcpu_data;
 };
 
 static DEFINE_MUTEX(trace_state_mutex);
 
 struct per_cpu_dm_data {
-   spinlock_t  lock;
-   struct sk_buff  *skb;
struct work_struct  dm_alert_work;
-   struct timer_list   send_timer;
 };
 
 struct dm_hw_stat_delta {
@@ -88,7 +90,7 @@ struct dm_hw_stat_delta {
 static int dm_delay = 1;
 static unsigned long dm_hw_check_delta = 2*HZ;
 
-static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
+static struct sk_buff *reset_per_cpu_data(struct ns_pcpu_dm_data *ns_dm_data)
 {
size_t al;
struct net_dm_alert_msg *msg;
@@ -125,11 +127,11 @@ static struct sk_buff *reset_per_cpu_data(struct 
per_cpu_dm_data *data)
goto out;
 
 err:
-   mod_timer(>send_timer, jiffies + HZ / 10);
+   mod_timer(_dm_data->send_timer, jiffies + HZ / 10);
 out:
-   spin_lock_irqsave(>lock, flags);
-   swap(data->skb, skb);
-   spin_unlock_irqrestore(>lock, flags);
+   spin_lock_irqsave(_dm_data->lock, flags);
+   swap(ns_dm_data->skb, skb);
+   spin_unlock_irqrestore(_dm_data->lock, flags);
 
if (skb) {
struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
@@ -147,16 +149,30 @@ static struct sk_buff *reset_per_cpu_data(struct 
per_cpu_dm_data *data)
 
 static void send_dm_alert(struct work_struct *work)
 {
+   struct net *net;
struct sk_buff *skb;
-   struct per_cpu_dm_data *data;
-
-   data = container_of(work, struct per_cpu_dm_data, dm_alert_work);
-
-   skb = reset_per_cpu_data(data);
-
-   if (skb)
-   genlmsg_multicast(_drop_monitor_family, skb, 0,
- 0, GFP_KERNEL);
+   struct ns_pcpu_dm_data *pcpu_data;
+   struct per_ns_dm_cb *ns_dm_net;
+   struct ns_pcpu_dm_data *data;
+
+   for_each_net_rcu(net) {
+   ns_dm_net = net_generic(net, dm_net_id);
+   if (!ns_dm_net)
+   continue;
+   if (ns_dm_net->trace_state == TRACE_OFF)
+   continue;
+
+   pcpu_data = ns_dm_net->pcpu_data;
+   if (!pcpu_data)
+   continue;
+
+   data = (struct ns_pcpu_dm_data *)this_cpu_ptr(pcpu_data);
+   WARN_ON(data->net != net);
+   skb = reset_per_cpu_data(data);
+   if (skb)
+   genlmsg_multicast_netns(_drop_monitor_family, net,
+   skb, 0, 0, GFP_KERNEL);
+   }
 }
 
 /*
@@ -166,9 +182,15 @@ static void send_dm_alert(struct work_struct *work)
  */
 static void sched_send_work(unsigned long _data)
 {
-   struct per_cpu_dm_data *data = (struct per_cpu_dm_data *)_data;
+   int cpu;
+   struct per_cpu_dm_data *dm_data;
 
-   schedule_work(>dm_alert_work);
+   cpu = (int)_data;
+   if (unlikely(cpu < 0))
+   return;
+
+   dm_data = _cpu(dm_cpu_data, cpu);
+   schedule_work(_data->dm_alert_work);
 }
 
 static void trace_drop_common(struct sk_buff *skb, void *location)
@@ -178,14 +200,30 @@ static void trace_drop_common(struct sk_buff *skb, void 
*location)
struct nlattr *nla;
int i;
struct sk_buff *dskb;
-   struct per_cpu_dm_data *data;
+   struct ns_pcpu_dm_data *data;
unsigned long flags;
+   struct net *net;
+   struct per_ns_dm_cb *ns_dm_net;
+
+   if (skb->dev)
+   net = dev_net(skb->dev);
+   else if (skb->sk)
+   net = sock_net(skb->sk);
+   else
+   return;
+
+   ns_dm_net = net_generic(net, dm_net_id);
+   if (unlikely(!ns_dm_net))
+   return;
+
+   data = this_cpu_ptr(ns_dm_net->pcpu_data);
+   if (unlikely(!data))
+   return;
 
local_irq_save(flags);
-   data = this_cpu_ptr(_cpu_data);
spin_lock(>lock);
-   dskb = data->skb;
 
+   dskb = data->skb;
 

[PATCH v1 net-next 3/5] drop_monitor: let hw_stats_list support net ns

2017-07-12 Thread martinbj2008
From: martin Zhang 

hw_stats_list is used to record NAPI state for net device.
Every net device belongs to one net ns.
so every net ns has a list head to record them.

Signed-off-by: martin Zhang 
---
 net/core/drop_monitor.c | 54 ++---
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 0cf25c3..875e8b4 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -53,10 +53,12 @@ struct ns_pcpu_dm_data {
  * struct per_ns_dm_cb  - drop monitor control block in per net ns.
  * @trace_state:the trace state.
  * @ns_dm_mutex:protect whole per_ns_dm_cb.
+ * @hw_stats_list:  monitor for NAPI of net device.
  */
 struct per_ns_dm_cb {
int trace_state;
struct mutex ns_dm_mutex;
+   struct list_head hw_stats_list;
 };
 
 static DEFINE_MUTEX(trace_state_mutex);
@@ -85,7 +87,6 @@ struct dm_hw_stat_delta {
 static int dm_hit_limit = 64;
 static int dm_delay = 1;
 static unsigned long dm_hw_check_delta = 2*HZ;
-static LIST_HEAD(hw_stats_list);
 
 static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
 {
@@ -225,16 +226,22 @@ static void trace_kfree_skb_hit(void *ignore, struct 
sk_buff *skb, void *locatio
 static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
int work, int budget)
 {
+   struct net *net;
struct dm_hw_stat_delta *new_stat;
+   struct per_ns_dm_cb *ns_dm_net;
 
/*
 * Don't check napi structures with no associated device
 */
if (!napi->dev)
return;
+   net = dev_net(napi->dev);
+   ns_dm_net = net_generic(net, dm_net_id);
+   if (!ns_dm_net)
+   return;
 
rcu_read_lock();
-   list_for_each_entry_rcu(new_stat, _stats_list, list) {
+   list_for_each_entry_rcu(new_stat, _dm_net->hw_stats_list, list) {
/*
 * only add a note to our monitor buffer if:
 * 1) this is the dev we received on
@@ -256,8 +263,6 @@ static void trace_napi_poll_hit(void *ignore, struct 
napi_struct *napi,
 static int set_all_monitor_traces(int state)
 {
int rc = 0;
-   struct dm_hw_stat_delta *new_stat = NULL;
-   struct dm_hw_stat_delta *temp;
 
mutex_lock(_state_mutex);
 
@@ -290,16 +295,6 @@ static int set_all_monitor_traces(int state)
 
tracepoint_synchronize_unregister();
 
-   /*
-* Clean the device list
-*/
-   list_for_each_entry_safe(new_stat, temp, _stats_list, list) {
-   if (new_stat->dev == NULL) {
-   list_del_rcu(_stat->list);
-   kfree_rcu(new_stat, rcu);
-   }
-   }
-
module_put(THIS_MODULE);
 
break;
@@ -368,6 +363,19 @@ static int net_dm_cmd_trace(struct sk_buff *skb,
return -ENOTSUPP;
}
 
+   if (state == TRACE_OFF) {
+   /* Clean the device list */
+   struct dm_hw_stat_delta *new_stat = NULL;
+   struct dm_hw_stat_delta *temp;
+   struct list_head *head = _dm_cb->hw_stats_list;
+
+   list_for_each_entry_safe(new_stat, temp, head, list) {
+   if (!new_stat->dev) {
+   list_del_rcu(_stat->list);
+   kfree_rcu(new_stat, rcu);
+   }
+   }
+   }
ns_dm_cb->trace_state = state;
mutex_unlock(_dm_cb->ns_dm_mutex);
 
@@ -382,6 +390,7 @@ static int dropmon_net_event(struct notifier_block 
*ev_block,
struct dm_hw_stat_delta *new_stat = NULL;
struct dm_hw_stat_delta *tmp;
struct per_ns_dm_cb *ns_dm_cb;
+   struct list_head *head;
 
dev = netdev_notifier_info_to_dev(ptr);
if (!dev)
@@ -391,23 +400,23 @@ static int dropmon_net_event(struct notifier_block 
*ev_block,
ns_dm_cb = net_generic(net, dm_net_id);
if (!ns_dm_cb)
goto out;
+   head = _dm_cb->hw_stats_list;
 
switch (event) {
case NETDEV_REGISTER:
new_stat = kzalloc(sizeof(struct dm_hw_stat_delta), GFP_KERNEL);
-
if (!new_stat)
goto out;
 
new_stat->dev = dev;
new_stat->last_rx = jiffies;
-   mutex_lock(_state_mutex);
-   list_add_rcu(_stat->list, _stats_list);
-   mutex_unlock(_state_mutex);
+   mutex_lock(_dm_cb->ns_dm_mutex);
+   list_add_rcu(_stat->list, head);
+   mutex_unlock(_dm_cb->ns_dm_mutex);
break;
case NETDEV_UNREGISTER:
-   mutex_lock(_state_mutex);
-   

[PATCH v1 net-next 1/5] drop_monitor: import netnamespace framework

2017-07-12 Thread martinbj2008
From: martin Zhang 

This is a serial patch for drop monitor, in order to support net namespace.

Import two struct to support net ns:

1. struct per_ns_dm_cb:
  Just like its name, it is used in per net ns.

  In this patch it is empty, but in following patch, these field will be added.
  a. trace_state: every net ns has a switch to indicate the trace state.
  b. ns_dm_mutex: the mutex will only work and keep exclusive operatons in a 
net ns.
  c. hw_stats_list: monitor for NAPI of net device.

2. ns_pcpu_dm_data
   It is used to replace per_cpu_dm_data under per net ns.

   per_cpu_dm_data will only keep the dm_alert_work, and the other field
will be moved to ns_pcpu_dm_data. They do same thing just like current
code, and the only difference is under per net ns.

  Keep there is a work under percpu, to send alter netlink message.

Signed-off-by: martin Zhang 
---
The dropwatch is a very useful tool to diagnose network problem,
which give us greate help.
Dropwatch could not work under container(net namespace).
It is a pitty, so let it support net ns.

 net/core/drop_monitor.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 70ccda2..6a75e04 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -32,6 +32,10 @@
 #include 
 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #define TRACE_ON 1
 #define TRACE_OFF 0
@@ -41,6 +45,13 @@
  * and the work handle that will send up
  * netlink alerts
  */
+
+struct ns_pcpu_dm_data {
+};
+
+struct per_ns_dm_cb {
+};
+
 static int trace_state = TRACE_OFF;
 static DEFINE_MUTEX(trace_state_mutex);
 
@@ -59,6 +70,7 @@ struct dm_hw_stat_delta {
unsigned long last_drop_val;
 };
 
+static int dm_net_id __read_mostly;
 static struct genl_family net_drop_monitor_family;
 
 static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data);
@@ -382,6 +394,33 @@ static int dropmon_net_event(struct notifier_block 
*ev_block,
.notifier_call = dropmon_net_event
 };
 
+static int __net_init dm_net_init(struct net *net)
+{
+   struct per_ns_dm_cb *ns_dm_cb;
+
+   ns_dm_cb = net_generic(net, dm_net_id);
+   if (!ns_dm_cb)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static void __net_exit dm_net_exit(struct net *net)
+{
+   struct per_ns_dm_cb *ns_dm_cb;
+
+   ns_dm_cb = net_generic(net, dm_net_id);
+   if (!ns_dm_cb)
+   return;
+}
+
+static struct pernet_operations dm_net_ops = {
+   .init = dm_net_init,
+   .exit = dm_net_exit,
+   .id   = _net_id,
+   .size = sizeof(struct per_ns_dm_cb),
+};
+
 static int __init init_net_drop_monitor(void)
 {
struct per_cpu_dm_data *data;
@@ -393,6 +432,7 @@ static int __init init_net_drop_monitor(void)
pr_err("Unable to store program counters on this arch, Drop 
monitor failed\n");
return -ENOSPC;
}
+   rc = register_pernet_subsys(_net_ops);
 
rc = genl_register_family(_drop_monitor_family);
if (rc) {
@@ -441,6 +481,7 @@ static void exit_net_drop_monitor(void)
 * or pending schedule calls
 */
 
+   unregister_pernet_subsys(_net_ops);
for_each_possible_cpu(cpu) {
data = _cpu(dm_cpu_data, cpu);
del_timer_sync(>send_timer);
-- 
1.8.3.1



[PATCH v1 net-next 2/5] drop_monitor: let dm trace state support ns

2017-07-12 Thread martinbj2008
From: martin Zhang 

Every net ns has its own trace_state,
and use a ref to control trace state of whole kernel.

trace_state in struct per_ns_dm_cb:
Just like the previous trace state, record the trace state for
every net ns. Possible values are ON/OFF.

dm_trace_ref: record how many net namespace is set to
TRACE_ON. increase when a net ns change to ON,
and decrease for OFF.

Signed-off-by: martin Zhang 
---
 net/core/drop_monitor.c | 88 +
 1 file changed, 75 insertions(+), 13 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 6a75e04..0cf25c3 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -49,10 +49,16 @@
 struct ns_pcpu_dm_data {
 };
 
+/**
+ * struct per_ns_dm_cb  - drop monitor control block in per net ns.
+ * @trace_state:the trace state.
+ * @ns_dm_mutex:protect whole per_ns_dm_cb.
+ */
 struct per_ns_dm_cb {
+   int trace_state;
+   struct mutex ns_dm_mutex;
 };
 
-static int trace_state = TRACE_OFF;
 static DEFINE_MUTEX(trace_state_mutex);
 
 struct per_cpu_dm_data {
@@ -70,6 +76,7 @@ struct dm_hw_stat_delta {
unsigned long last_drop_val;
 };
 
+int dm_trace_ref;
 static int dm_net_id __read_mostly;
 static struct genl_family net_drop_monitor_family;
 
@@ -254,9 +261,16 @@ static int set_all_monitor_traces(int state)
 
mutex_lock(_state_mutex);
 
-   if (state == trace_state) {
-   rc = -EAGAIN;
-   goto out_unlock;
+   //Cases: Only inc/dec reference value.
+   if (state == TRACE_ON && dm_trace_ref > 0)
+   goto skip_register_trace;
+   else if (state == TRACE_OFF && dm_trace_ref > 1)
+   goto skip_register_trace;
+
+   //Bad cases.
+   if (dm_trace_ref < 0 || (dm_trace_ref == 0 && state == TRACE_OFF)) {
+   rc = -EINPROGRESS;
+   goto skip_register_trace;
}
 
switch (state) {
@@ -294,12 +308,15 @@ static int set_all_monitor_traces(int state)
break;
}
 
-   if (!rc)
-   trace_state = state;
-   else
+skip_register_trace:
+   if (!rc) {
+   if (state == TRACE_ON)
+   dm_trace_ref++;
+   else if (state == TRACE_OFF)
+   dm_trace_ref--;
+   } else
rc = -EINPROGRESS;
 
-out_unlock:
mutex_unlock(_state_mutex);
 
return rc;
@@ -315,22 +332,65 @@ static int net_dm_cmd_config(struct sk_buff *skb,
 static int net_dm_cmd_trace(struct sk_buff *skb,
struct genl_info *info)
 {
+   int state;
+   struct net *net;
+   struct per_ns_dm_cb *ns_dm_cb;
+
+   if (!skb->sk)
+   return -ENOTSUPP;
+   net = sock_net(skb->sk);
+   ns_dm_cb = net_generic(net, dm_net_id);
+
+   if (!ns_dm_cb)
+   return -ENOMEM;
+
switch (info->genlhdr->cmd) {
case NET_DM_CMD_START:
-   return set_all_monitor_traces(TRACE_ON);
+   state = TRACE_ON;
+   break;
+
case NET_DM_CMD_STOP:
-   return set_all_monitor_traces(TRACE_OFF);
+   state = TRACE_OFF;
+   break;
+
+   default:
+   return -ENOTSUPP;
}
 
-   return -ENOTSUPP;
+   mutex_lock(_dm_cb->ns_dm_mutex);
+   if (state == ns_dm_cb->trace_state) {
+   mutex_unlock(_dm_cb->ns_dm_mutex);
+   return -EAGAIN;
+   }
+
+   if (set_all_monitor_traces(state) != 0) {
+   mutex_unlock(_dm_cb->ns_dm_mutex);
+   return -ENOTSUPP;
+   }
+
+   ns_dm_cb->trace_state = state;
+   mutex_unlock(_dm_cb->ns_dm_mutex);
+
+   return 0;
 }
 
 static int dropmon_net_event(struct notifier_block *ev_block,
 unsigned long event, void *ptr)
 {
-   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct net *net;
+   struct net_device *dev;
struct dm_hw_stat_delta *new_stat = NULL;
struct dm_hw_stat_delta *tmp;
+   struct per_ns_dm_cb *ns_dm_cb;
+
+   dev = netdev_notifier_info_to_dev(ptr);
+   if (!dev)
+   goto out;
+
+   net = dev_net(dev);
+   ns_dm_cb = net_generic(net, dm_net_id);
+   if (!ns_dm_cb)
+   goto out;
 
switch (event) {
case NETDEV_REGISTER:
@@ -350,7 +410,7 @@ static int dropmon_net_event(struct notifier_block 
*ev_block,
list_for_each_entry_safe(new_stat, tmp, _stats_list, list) {
if (new_stat->dev == dev) {
new_stat->dev = NULL;
-   if (trace_state == TRACE_OFF) {
+   if (ns_dm_cb->trace_state == TRACE_OFF) {
list_del_rcu(_stat->list);
   

[PATCH v1 net-next 5/5] drop_monitor: increase version when ns support is ready

2017-07-12 Thread martinbj2008
From: martin Zhang 

1. increase DM netlink version from 2 to 3, as it now support net ns.
2. netns ok become ture.

Signed-off-by: martin Zhang 
---
 net/core/drop_monitor.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 5828bf2..064128b 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -489,7 +489,8 @@ static int dropmon_net_event(struct notifier_block 
*ev_block,
 static struct genl_family net_drop_monitor_family __ro_after_init = {
.hdrsize= 0,
.name   = "NET_DM",
-   .version= 2,
+   .version= 3,
+   .netnsok= 1,
.module = THIS_MODULE,
.ops= dropmon_ops,
.n_ops  = ARRAY_SIZE(dropmon_ops),
-- 
1.8.3.1



[PATCH net] net: hns: Bugfix for Tx timeout handling in hns driver

2017-07-12 Thread Lin Yun Sheng
When hns port type is not debug mode, netif_tx_disable is called
when there is a tx timeout, which requires system reboot to return
to normal state. This patch fix this problem by resetting the net
dev.

Fixes: b5996f11ea54 ("net: add Hisilicon Network Subsystem basic ethernet 
support")
Signed-off-by: Lin Yun Sheng 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index fe166e0..3987699 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1378,13 +1378,20 @@ void hns_nic_net_reset(struct net_device *ndev)
 void hns_nic_net_reinit(struct net_device *netdev)
 {
struct hns_nic_priv *priv = netdev_priv(netdev);
+   enum hnae_port_type type = priv->ae_handle->port_type;
 
netif_trans_update(priv->netdev);
while (test_and_set_bit(NIC_STATE_REINITING, >state))
usleep_range(1000, 2000);
 
hns_nic_net_down(netdev);
-   hns_nic_net_reset(netdev);
+
+   /* Only do hns_nic_net_reset in debug mode
+* because of hardware limitation.
+*/
+   if (type == HNAE_PORT_DEBUG)
+   hns_nic_net_reset(netdev);
+
(void)hns_nic_net_up(netdev);
clear_bit(NIC_STATE_REINITING, >state);
 }
@@ -1997,13 +2004,8 @@ static void hns_nic_reset_subtask(struct hns_nic_priv 
*priv)
rtnl_lock();
/* put off any impending NetWatchDogTimeout */
netif_trans_update(priv->netdev);
+   hns_nic_net_reinit(priv->netdev);
 
-   if (type == HNAE_PORT_DEBUG) {
-   hns_nic_net_reinit(priv->netdev);
-   } else {
-   netif_carrier_off(priv->netdev);
-   netif_tx_disable(priv->netdev);
-   }
rtnl_unlock();
 }
 
-- 
1.9.1



Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-12 Thread Ding Tianhong


On 2017/7/11 8:01, Casey Leedom wrote:
> 
> Hey Alexander,
> 
>   Okay, I understand your point regarding the "most likely scenario" being
> TLPs directed upstream to the Root Complex.  But I'd still like to make sure
> that we have an agreed upon API/methodology for doing Peer-to-Peer with
> Relaxed Ordering and no Relaxed Ordering to the Root Complex.  I don't see
> how the proposed APIs can be used in that fashion.
>  
>   Right now the proposed change for cxgb4 is for it to test its own PCIe
> Capability Device Control[Relaxed Ordering Enable] in order to use that
> information to program the Chelsio Hardware to emit/not emit upstream TLPs
> with the Relaxed Ordering Attribute set.  But if we're going to have the
> mixed mode situation I describe, the PCIe Capability Device Control[Relaxed
> Ordering Enable] will have to be set which means that we'll be programming
> the Chelsio Hardware to send upstream TLPs with Relaxed Ordering Enable to
> the Root Complex which is what we were trying to avoid in the first place ...
> 
>   [[ And, as I noted on Friday evening, the currect cxgb4 Driver hardwires
>  the Relaxed Ordering Enable on early dureing device probe, so that
>  would minimally need to be addressed even if we decide that we don't
>  ever want to support mixed mode Relaxed Ordering. ]]
> 
>   We need some method of telling the Chelsio Driver that it should/shouldn't
> use Relaxed Ordering with TLPs directed at the Root Complex.  And the same
> is true for a Peer PCIe Device.
> 
>   It may be that we should approach this from the completely opposite
> direction and instead of having quirks which identify problematic devices,
> have quirks which identify devices which would benefit from the use of
> Relaxed Ordering (if the sending device supports that).  That is, assume the
> using Relaxed Ordering shouldn't be done unless the target device says "I
> love Relaxed Ordering TLPs" ...  In such a world, an NVMe or a Graphics
> device might declare love of Relaxed Ordering and the same for a SPARC Root
> Complex (I think that was your example).
> 
>   By the way, the sole example of Data Corruption with Relaxed Ordering is
> the AMD A1100 ARM SoC and AMD appears to have given up on that almost as
> soon as it was released.  So what we're left with currently is a performance
> problem on modern Intel CPUs ...  (And hopefully we'll get a Technical
> Publication on that issue fairly soon.)
> 
> Casey
> 

Hi Casey:

After the long discuss, I think If the PCIe Capability Device Control[Relaxed 
Ordering
Enable] to be cleared when the platform's RC has some problematic for RO didn't 
break
anything in your driver, I think you could choose to check the
(!pci_dev_should_disable_relaxed_ordering(root)) in the code to to enable
ROOT_NO_RELAXED_ORDERING for your adapter, and enable the PCIe Capability 
Device Control
[Relaxed Ordering Enable] bit when you need it, I think we don't have much gap 
here.
And we could leave the pear-to-pear situation to be fixed later.

Thanks
Ding

> .
> 



Re: [PATCH 1/1] mlx4_en: remove unnecessary returned value check

2017-07-12 Thread Leon Romanovsky
On Wed, Jul 12, 2017 at 11:36:48AM +0300, Yuval Shaia wrote:
> On Wed, Jul 12, 2017 at 02:44:33AM -0400, Zhu Yanjun wrote:
> > The function __mlx4_zone_remove_one_entry always returns zero. So
> > it is not necessary to check it.
> >
> > Cc: Joe Jin 
> > Cc: Junxiao Bi 
> > Signed-off-by: Zhu Yanjun 
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/alloc.c | 7 +++
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c 
> > b/drivers/net/ethernet/mellanox/mlx4/alloc.c
> > index 249a458..bfb185d 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
> > @@ -283,7 +283,7 @@ int mlx4_zone_add_one(struct mlx4_zone_allocator 
> > *zone_alloc,
> >  }
> >
> >  /* Should be called under a lock */
> > -static int __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
> > +static void __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
> >  {
> > struct mlx4_zone_allocator *zone_alloc = entry->allocator;
> >
> > @@ -315,8 +315,6 @@ static int __mlx4_zone_remove_one_entry(struct 
> > mlx4_zone_entry *entry)
> > }
> > zone_alloc->mask = mask;
> > }
> > -
> > -   return 0;
> >  }
> >
> >  void mlx4_zone_allocator_destroy(struct mlx4_zone_allocator *zone_alloc)
> > @@ -468,7 +466,8 @@ int mlx4_zone_remove_one(struct mlx4_zone_allocator 
> > *zones, u32 uid)
> > goto out;
> > }
> >
> > -   res = __mlx4_zone_remove_one_entry(zone);
> > +   __mlx4_zone_remove_one_entry(zone);
> > +   res = 0;
>
> Will look better if initialization will move to variable declaration.
>

Yes, please.

Thanks


signature.asc
Description: PGP signature


Re: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option

2017-07-12 Thread Richard Leitner

On 07/07/2017 04:00 PM, Andrew Lunn wrote:
>> Ok. I'm fine with moving the phy-reset-gpios binding into the PHY.
>> But one question still remains: Who should then trigger the "hard
>> reset" of the PHY?
> 
> Hi Richard
> 
> I think i see a few whys to do this, but first i need to check
> something. Is the clock which is causing a problem this one:
> 
> /* clk_ref is optional, depends on board */
> fep->clk_ref = devm_clk_get(>dev, "enet_clk_ref");
> if (IS_ERR(fep->clk_ref))
> fep->clk_ref = NULL;

Yes. It's this one.

> 
> Possible solutions:
> 
> 1) clocks are referenced counted. If it is turned on twice, it needs
>to be turned off twice before it is actually turned off. So, make
>the PHY driver also clk_prepare_enable() this clock. When the FEC
>tries to turn it off, it will stay ticking. Problem avoided, at the
>expense of some power.

Somehow this approach triggers a "workaround-feeling" for me...
Furthermore as you say it "wastes" (at least some) power. For exactly
this reason the disabling of the clock was implemented in commit
e8fcfcd5684a ("net: fec: optimize the clock management to save power").

> 
> 2) More complex, but make the PHY driver also a clock driver. Have the
>PHY driver export a clock which the FEC use, as "enet_clk_ref". The
>implementation of this clock, would both turn the real clock on,
>and the perform the reset.

This seems as a good solution to me. Furthermore IMHO it would be good
to move all PHY related dt bindings (reset-gpio, clk, etc.) from the MAC
into the PHY node.

Or are there any reasons/arguments against this approach?

> 
> Both require no changes to the FEC, or any other MAC driver using this
> PHY, so long as the MAC driver uses the common clock infrastructure to
> control the clock to the PHY.
As (IMHO) the new approach likely won't be backported to stable releases
I want to stress again the point that commit e8fcfcd5684a
("net: fec: optimize the clock management to save power") introduced
this problem and therefore "broke the PHY" for our board.

So would it be possible to add a "quick" bugfix patch (maybe this patch
or another one removing the clk disable) so this fix can be backported
to stable? Otherwise our board is only working with another
"out-of-tree" patch (which I want to avoid)...

kind regards,
Richard.L


Re: [PATCH 1/1] mlx4_en: remove unnecessary returned value check

2017-07-12 Thread Yuval Shaia
On Wed, Jul 12, 2017 at 02:44:33AM -0400, Zhu Yanjun wrote:
> The function __mlx4_zone_remove_one_entry always returns zero. So
> it is not necessary to check it.
> 
> Cc: Joe Jin 
> Cc: Junxiao Bi 
> Signed-off-by: Zhu Yanjun 
> ---
>  drivers/net/ethernet/mellanox/mlx4/alloc.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c 
> b/drivers/net/ethernet/mellanox/mlx4/alloc.c
> index 249a458..bfb185d 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
> @@ -283,7 +283,7 @@ int mlx4_zone_add_one(struct mlx4_zone_allocator 
> *zone_alloc,
>  }
>  
>  /* Should be called under a lock */
> -static int __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
> +static void __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
>  {
>   struct mlx4_zone_allocator *zone_alloc = entry->allocator;
>  
> @@ -315,8 +315,6 @@ static int __mlx4_zone_remove_one_entry(struct 
> mlx4_zone_entry *entry)
>   }
>   zone_alloc->mask = mask;
>   }
> -
> - return 0;
>  }
>  
>  void mlx4_zone_allocator_destroy(struct mlx4_zone_allocator *zone_alloc)
> @@ -468,7 +466,8 @@ int mlx4_zone_remove_one(struct mlx4_zone_allocator 
> *zones, u32 uid)
>   goto out;
>   }
>  
> - res = __mlx4_zone_remove_one_entry(zone);
> + __mlx4_zone_remove_one_entry(zone);
> + res = 0;

Will look better if initialization will move to variable declaration.

Besides this minor thing - lgtm.

Reviewed-by: Yuval Shaia 

>  
>  out:
>   spin_unlock(>lock);
> -- 
> 2.7.4
> 


[PATCH v2 03/22] net: broadcom: stop using rtc deprecated functions

2017-07-12 Thread Benjamin Gaignard
rtc_time_to_tm() and rtc_tm_to_time() are deprecated because they
rely on 32bits variables and that will make rtc break in y2038/2016.
Stop using those two functions to safer 64bits ones.

Signed-off-by: Benjamin Gaignard 
CC: Michael Chan 
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b56c54d..9fef202 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4703,7 +4703,7 @@ int bnxt_hwrm_fw_set_time(struct bnxt *bp)
return -EOPNOTSUPP;
 
do_gettimeofday();
-   rtc_time_to_tm(tv.tv_sec, );
+   rtc_time64_to_tm(tv.tv_sec, );
bnxt_hwrm_cmd_hdr_init(bp, , HWRM_FW_SET_TIME, -1, -1);
req.year = cpu_to_le16(1900 + tm.tm_year);
req.month = 1 + tm.tm_mon;
-- 
1.9.1



Re: [PATCH net] net: ipmr: ipmr_get_table() returns NULL

2017-07-12 Thread Nikolay Aleksandrov
On 12/07/17 10:56, Dan Carpenter wrote:
> The ipmr_get_table() function doesn't return error pointers it returns
> NULL on error.
> 
> Fixes: 4f75ba6982bc ("net: ipmr: Add ipmr_rtm_getroute")
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index bb909f1d7537..06863ea3fc5b 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -2431,8 +2431,8 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, 
> struct nlmsghdr *nlh,
>   tableid = tb[RTA_TABLE] ? nla_get_u32(tb[RTA_TABLE]) : 0;
>  
>   mrt = ipmr_get_table(net, tableid ? tableid : RT_TABLE_DEFAULT);
> - if (IS_ERR(mrt)) {
> - err = PTR_ERR(mrt);
> + if (!mrt) {
> + err = -ENOENT;
>   goto errout_free;
>   }
>  
> 

Good catch, ipmr_new_table() is the one that can return err ptr.

Acked-by: Nikolay Aleksandrov 



  1   2   >