date:20171212

Re: [RFC][PATCH] new byteorder primitives - ..._{replace,get}_bits()

2017-12-12 Thread Al Viro

On Tue, Dec 12, 2017 at 06:20:02AM +, Al Viro wrote:

> Umm...  What's wrong with
> 
> #define FIELD_FOO 0,4
> #define FIELD_BAR 6,12
> #define FIELD_BAZ 18,14
> 
> A macro can bloody well expand to any sequence of tokens - le32_get_bits(v, 
> FIELD_BAZ)
> will become le32_get_bits(v, 18, 14) just fine.  What's the problem with that?

FWIW, if you want to use the mask, __builtin_ffsll() is not the only way to do
it - you don't need the shift.  Multiplier would do just as well, and that can
be had easier.  If mask = (2*a + 1)<

Re: [RFC][PATCH] new byteorder primitives - ..._{replace,get}_bits()

2017-12-12 Thread Jakub Kicinski

On Tue, 12 Dec 2017 19:45:32 +, Al Viro wrote:
> On Tue, Dec 12, 2017 at 06:20:02AM +, Al Viro wrote:
> 
> > Umm...  What's wrong with
> > 
> > #define FIELD_FOO 0,4
> > #define FIELD_BAR 6,12
> > #define FIELD_BAZ 18,14
> > 
> > A macro can bloody well expand to any sequence of tokens - le32_get_bits(v, 
> > FIELD_BAZ)
> > will become le32_get_bits(v, 18, 14) just fine.  What's the problem with 
> > that?  
> 
> FWIW, if you want to use the mask, __builtin_ffsll() is not the only way to do
> it - you don't need the shift.  Multiplier would do just as well, and that can
> be had easier.  If mask = (2*a + 1)<   mask - 1 = ((2*a) << n) + ((1<   mask ^ (mask - 1) = (1< and
>   mask & (mask ^ (mask - 1)) = 1< 
> IOW, with
> 
> static __always_inline u64 mask_to_multiplier(u64 mask)
> {
>   return mask & (mask ^ (mask - 1));
> }
> 
> we could do
> 
> static __always_inline __le64 le64_replace_bits(__le64 old, u64 v, u64 mask)
> {
>   __le64 m = cpu_to_le64(mask);
>   return (old & ~m) | (cpu_to_le64(v * mask_to_multiplier(mask)) & m);
> }
> 
> static __always_inline u64 le64_get_bits(__le64 v, u64 mask)
> {
>   return (le64_to_cpu(v) & mask) / mask_to_multiplier(mask);
> }
> 
> etc.  Compiler will turn those into shifts...  I can live with either calling
> conventions.
> 
> Comments?

Very nice!  The compilation-time check if the value can fit in a field
covered by the mask (if they're both known) did help me catch bugs
early a few times over the years, so if it could be preserved we can
maybe even drop the FIELD_* macros and just use this approach?

[PATCH iproute2] Show 'external' link mode in output

2017-12-12 Thread Phil Dibowitz

Recently `external` support was added to the tunnel drivers, but there is no way
to introspect this from userspace. This adds support for that.

Now `ip -details link` shows it:

```
7: tunl60@NONE:  mtu 1452 qdisc noop state DOWN mode DEFAULT group
default qlen 1
link/tunnel6 :: brd :: promiscuity 0
ip6tnl external any remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 
flowlabel 0x0 (flowinfo 0x) addrgenmode eui64 numtxqueues 1 
numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```

Signed-off-by: Phil Dibowitz 
---
 ip/link_ip6tnl.c | 3 +++
 ip/link_iptnl.c  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 43287ab..5d0efc8 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -345,6 +345,9 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb
if (!tb)
return;
 
+if (tb[IFLA_IPTUN_COLLECT_METADATA])
+print_bool(PRINT_ANY, "external", "external ", true);
+
if (tb[IFLA_IPTUN_FLAGS])
flags = rta_getattr_u32(tb[IFLA_IPTUN_FLAGS]);
 
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 4940b8b..e345b5c 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -393,6 +393,9 @@ static void iptunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[
if (!tb)
return;
 
+if (tb[IFLA_IPTUN_COLLECT_METADATA])
+print_bool(PRINT_ANY, "external", "external ", true);
+
if (tb[IFLA_IPTUN_REMOTE]) {
unsigned int addr = rta_getattr_u32(tb[IFLA_IPTUN_REMOTE]);
 
-- 
2.9.5

Re: [PATCH iproute2] tc: bash-completion: add missing 'classid' keyword

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:45:15 +0100
Davide Caratti  wrote:

> users of 'matchall' filter can specify a value for the class id: update
> bash-completion accordingly.
> 
> Fixes: b32c0b64fa2b ("tc: bash-completion: Add support for matchall")
> Signed-off-by: Davide Caratti 

Looks good applied. Thanks Davide

Re: [PATCH iproute2 net-next v2 0/4] Abstract columns, properly space and wrap fields

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 01:46:29 +0100
Stefano Brivio  wrote:

> Currently, 'ss' simply subdivides the whole available screen width
> between available columns, starting from a set of hardcoded amount
> of spacing and growing column widths.
> 
> This makes the output unreadable in several cases, as it doesn't take
> into account the actual content width.
> 
> Fix this by introducing a simple abstraction for columns, buffering
> the output, measuring the width of the fields, grouping fields into
> lines as they fit, equally distributing any remaining whitespace, and
> finally rendering the result. Some examples are reported below [1].
> 
> This implementation doesn't seem to cause any significant performance
> issues, as reported in 3/4.
> 
> Patch 1/4 replaces all relevant printf() calls by the out() helper,
> which simply consists of the usual printf() implementation.
> 
> Patch 2/4 implements column abstraction, with configurable column
> width and delimiters, and 3/4 splits buffering and rendering phases,
> employing a simple buffering mechanism with chunked allocation and
> introducing a rendering function.
> 
> Up to this point, the output is still unchanged.
> 
> Finally, 4/4 introduces field width calculation based on content
> length measured while buffering, in order to split fields onto
> multiple lines and equally space them within the single lines.
> 
> Now that column behaviour is well-defined and more easily
> configurable, it should be easier to further improve the output by
> splitting logically separable information (e.g. TCP details) into
> additional columns. However, this patchset keeps the full "extended"
> information into a single column, for the moment being.
> 
> 
> v2: rebase after conflict with 00ac78d39c29 ("ss: print tcpi_rcv_ssthresh")
> 
> 
> [1]
> 
> - 80 columns terminal, ss -Z -f netlink
>   * before:
> Recv-Q Send-Q Local Address:Port Peer Address:Port
> 
> 0  0rtnl:evolution-calen/2075   * 
> pr
> oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 0  0rtnl:abrt-applet/32700  * 
> pr
> oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 0  0rtnl:firefox/21619  * 
> pr
> oc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 0  0rtnl:evolution-calen/32639   *
>  p
> roc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [...]
> 
>   * after:
> Recv-Q   Send-Q Local Address:Port  Peer Address:Port
> 00   rtnl:evolution-calen/2075  *
>  proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 00   rtnl:abrt-applet/32700 *
>  proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 00   rtnl:firefox/21619 *
>  proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> 00   rtnl:evolution-calen/32639 *
>  proc_ctx=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [...]
> 
> - 80 columns terminal, ss -tunpl
>   * before:
> Netid  State  Recv-Q Send-Q Local Address:Port   Peer 
> Address:Port
> udpUNCONN 0  0 *:37732 *:*
> udpUNCONN 0  0 *:5353  *:*
> udpUNCONN 0  0  192.168.122.1:53*:*
> udpUNCONN 0  0  *%virbr0:67*:*
> [...]
> 
>   * after:
> Netid   StateRecv-Q   Send-Q Local Address:Port  Peer Address:Port
> udp UNCONN   00  *:37732*:*
> udp UNCONN   00  *:5353 *:*
> udp UNCONN   00  192.168.122.1:53   *:*
> udp UNCONN   00   *%virbr0:67   *:*
> [...]
> 
>  - 66 columns terminal, ss -tunpl
>   * before:
> Netid  State  Recv-Q Send-Q Local Address:Port   P
> eer Address:Port
> udpUNCONN 0  0   *:37732   *:*
> 
> udpUNCONN 0  0   *:5353*:*
> 
> udpUNCONN 0  0  192.168.122.1:53
> *:*
> udpUNCONN 0  0  *%virbr0:67  *:*
> [...]
> 
>   * after:
> Netid State  Recv-Q Send-Q Local Address:Port   Peer Address:Port
> udp   UNCONN 0  0  *:37732 *:*
> udp   UNCONN 0  0  *:5353  *:*
> udp   UNCONN 0  0  192.168.122.1:53*:*
> udp   UNCONN 0  0   *%virbr0:67*:*
> [...]
> 
> 
> Stefano Brivio (4):
>   ss: Replace printf() calls for "main" output by calls to helper
>   ss: Introduce columns lightweight abstraction
>   ss: Buffer raw fields

[PATCH net-next] cxgb4: Add support for ethtool i2c dump

2017-12-12 Thread Ganesh Goudar

From: Arjun Vynipadath 

Adds support for ethtool get_module_info() and get_module_eeprom()
callbacks that will dump necessary information for a SFP.

Signed-off-by: Arjun Vynipadath 
Signed-off-by: Casey Leedom 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 18 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c | 97 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 56 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h | 10 +++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  1 +
 5 files changed, 182 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 97dc3ef..b1df2aa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1424,6 +1424,21 @@ static inline void init_rspq(struct adapter *adap, 
struct sge_rspq *q,
q->size = size;
 }
 
+/**
+ * t4_is_inserted_mod_type - is a plugged in Firmware Module Type
+ * @fw_mod_type: the Firmware Mofule Type
+ *
+ * Return whether the Firmware Module Type represents a real Transceiver
+ * Module/Cable Module Type which has been inserted.
+ */
+static inline bool t4_is_inserted_mod_type(unsigned int fw_mod_type)
+{
+   return (fw_mod_type != FW_PORT_MOD_TYPE_NONE &&
+   fw_mod_type != FW_PORT_MOD_TYPE_NOTSUPPORTED &&
+   fw_mod_type != FW_PORT_MOD_TYPE_UNKNOWN &&
+   fw_mod_type != FW_PORT_MOD_TYPE_ERROR);
+}
+
 void t4_write_indirect(struct adapter *adap, unsigned int addr_reg,
   unsigned int data_reg, const u32 *vals,
   unsigned int nregs, unsigned int start_idx);
@@ -1697,6 +1712,9 @@ void t4_uld_mem_free(struct adapter *adap);
 int t4_uld_mem_alloc(struct adapter *adap);
 void t4_uld_clean_up(struct adapter *adap);
 void t4_register_netevent_notifier(void);
+int t4_i2c_rd(struct adapter *adap, unsigned int mbox, int port,
+ unsigned int devid, unsigned int offset,
+ unsigned int len, u8 *buf);
 void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq, struct sge_fl 
*fl);
 void free_tx_desc(struct adapter *adap, struct sge_txq *q,
  unsigned int n, bool unmap);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
index eb33821..541419b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
@@ -1396,6 +1396,101 @@ static int get_dump_data(struct net_device *dev, struct 
ethtool_dump *eth_dump,
return 0;
 }
 
+static int cxgb4_get_module_info(struct net_device *dev,
+struct ethtool_modinfo *modinfo)
+{
+   struct port_info *pi = netdev_priv(dev);
+   u8 sff8472_comp, sff_diag_type, sff_rev;
+   struct adapter *adapter = pi->adapter;
+   int ret;
+
+   if (!t4_is_inserted_mod_type(pi->mod_type))
+   return -EINVAL;
+
+   switch (pi->port_type) {
+   case FW_PORT_TYPE_SFP:
+   case FW_PORT_TYPE_QSA:
+   case FW_PORT_TYPE_SFP28:
+   ret = t4_i2c_rd(adapter, adapter->mbox, pi->tx_chan,
+   I2C_DEV_ADDR_A0, SFF_8472_COMP_ADDR,
+   SFF_8472_COMP_LEN, _comp);
+   if (ret)
+   return ret;
+   ret = t4_i2c_rd(adapter, adapter->mbox, pi->tx_chan,
+   I2C_DEV_ADDR_A0, SFP_DIAG_TYPE_ADDR,
+   SFP_DIAG_TYPE_LEN, _diag_type);
+   if (ret)
+   return ret;
+
+   if (!sff8472_comp || (sff_diag_type & 4)) {
+   modinfo->type = ETH_MODULE_SFF_8079;
+   modinfo->eeprom_len = ETH_MODULE_SFF_8079_LEN;
+   } else {
+   modinfo->type = ETH_MODULE_SFF_8472;
+   modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+   }
+   break;
+
+   case FW_PORT_TYPE_QSFP:
+   case FW_PORT_TYPE_QSFP_10G:
+   case FW_PORT_TYPE_CR_QSFP:
+   case FW_PORT_TYPE_CR2_QSFP:
+   case FW_PORT_TYPE_CR4_QSFP:
+   ret = t4_i2c_rd(adapter, adapter->mbox, pi->tx_chan,
+   I2C_DEV_ADDR_A0, SFF_REV_ADDR,
+   SFF_REV_LEN, _rev);
+   /* For QSFP type ports, revision value >= 3
+* means the SFP is 8636 compliant.
+*/
+   if (ret)
+   return ret;
+   if (sff_rev >= 0x3) {
+   modinfo->type = ETH_MODULE_SFF_8636;
+   modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+   } else {
+   modinfo->type =

Re: [PATCH net-next v5 2/2] net: thunderx: add timestamping support

2017-12-12 Thread Joe Perches

On Mon, 2017-12-11 at 15:36 -0800, Richard Cochran wrote:
> On Mon, Dec 11, 2017 at 05:14:31PM +0300, Aleksey Makarov wrote:
> > @@ -880,6 +889,46 @@ static void nic_pause_frame(struct nicpf *nic, int vf, 
> > struct pfc *cfg)
> > }
> >  }
> >  
> > +/* Enable or disable HW timestamping by BGX for pkts received on a LMAC */
> > +static void nic_config_timestamp(struct nicpf *nic, int vf, struct set_ptp 
> > *ptp)
> > +{
> > +   struct pkind_cfg *pkind;
> > +   u8 lmac, bgx_idx;
> > +   u64 pkind_val, pkind_idx;
> > +
> > +   if (vf >= nic->num_vf_en)
> > +   return;
> > +
> > +   bgx_idx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
> > +   lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
> > +
> > +   pkind_idx = lmac + bgx_idx * MAX_LMAC_PER_BGX;
> > +   pkind_val = nic_reg_read(nic, NIC_PF_PKIND_0_15_CFG | (pkind_idx << 3));
> > +   pkind = (struct pkind_cfg *)_val;
> > +
> > +   if (ptp->enable && !pkind->hdr_sl) {
> > +   /* Skiplen to exclude 8byte timestamp while parsing pkt
> > +* If not configured, will result in L2 errors.
> > +*/
> > +   pkind->hdr_sl = 4;
> > +   /* Adjust max packet length allowed */
> > +   pkind->maxlen += (pkind->hdr_sl * 2);

Are all compilers smart enough to set this to 8?
I rather doubt a compiler is even allowed to.

Re: [PATCH v2 2/3] dt-bindings: Add optional nvmem BD address bindings to ti,wlink-st

2017-12-12 Thread Rob Herring

On Thu, Dec 07, 2017 at 08:57:39PM -0600, David Lechner wrote:
> This adds optional nvmem consumer properties to the ti,wlink-st device tree
> bindings to allow specifying the BD address.
> 
> Signed-off-by: David Lechner 
> ---
> 
> v2 changes:
> * Renamed "mac-address" to "bd-address"
> * Fixed typos in example
> * Specify byte order of "bd-address"
> 
>  Documentation/devicetree/bindings/net/ti,wilink-st.txt | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Rob Herring

Re: [PATCH v3 29/33] dt-bindings: nds32 CPU Bindings

2017-12-12 Thread Rob Herring

On Fri, Dec 08, 2017 at 05:12:12PM +0800, Greentime Hu wrote:
> From: Greentime Hu 
> 
> This patch adds nds32 CPU binding documents.
> 
> Signed-off-by: Vincent Chen 
> Signed-off-by: Rick Chen 
> Signed-off-by: Zong Li 
> Signed-off-by: Greentime Hu 
> ---
>  Documentation/devicetree/bindings/nds32/cpus.txt |   37 
> ++
>  1 file changed, 37 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/nds32/cpus.txt

Reviewed-by: Rob Herring

[PATCH v3 1/3] Revert "ethtool: Add DMA Coalescing support"

2017-12-12 Thread Scott Branden

This reverts commit 5dd7bfbc5079cb375876e4e76191263fc28ae1a6.

As Stephen Hemminger mentioned
there is an ABI compatibility issue with this patch:

https://patchwork.ozlabs.org/patch/806049/#1757846
Signed-off-by: Scott Branden 
---
 ethtool-copy.h | 2 --
 ethtool.8.in   | 1 -
 ethtool.c  | 8 +---
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 4bb91eb..06fc04c 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -400,7 +400,6 @@ struct ethtool_modinfo {
  * a TX interrupt, when the packet rate is above @pkt_rate_high.
  * @rate_sample_interval: How often to do adaptive coalescing packet rate
  * sampling, measured in seconds.  Must not be zero.
- * @dmac: How many usecs to store packets before moving to host memory.
  *
  * Each pair of (usecs, max_frames) fields specifies that interrupts
  * should be coalesced until
@@ -451,7 +450,6 @@ struct ethtool_coalesce {
__u32   tx_coalesce_usecs_high;
__u32   tx_max_coalesced_frames_high;
__u32   rate_sample_interval;
-   __u32   dmac;
 };
 
 /**
diff --git a/ethtool.8.in b/ethtool.8.in
index 6ad3065..90ead41 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -165,7 +165,6 @@ ethtool \- query or control network driver and hardware 
settings
 .BN tx\-usecs\-high
 .BN tx\-frames\-high
 .BN sample\-interval
-.BN dmac
 .HP
 .B ethtool \-g|\-\-show\-ring
 .I devname
diff --git a/ethtool.c b/ethtool.c
index 1a2b7cc..c89b660 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1337,7 +1337,6 @@ static int dump_coalesce(const struct ethtool_coalesce 
*ecoal)
"sample-interval: %u\n"
"pkt-rate-low: %u\n"
"pkt-rate-high: %u\n"
-   "dmac: %u\n"
"\n"
"rx-usecs: %u\n"
"rx-frames: %u\n"
@@ -1363,7 +1362,6 @@ static int dump_coalesce(const struct ethtool_coalesce 
*ecoal)
ecoal->rate_sample_interval,
ecoal->pkt_rate_low,
ecoal->pkt_rate_high,
-   ecoal->dmac,
 
ecoal->rx_coalesce_usecs,
ecoal->rx_max_coalesced_frames,
@@ -2071,7 +2069,6 @@ static int do_scoalesce(struct cmd_context *ctx)
int coal_adaptive_rx_wanted = -1;
int coal_adaptive_tx_wanted = -1;
s32 coal_sample_rate_wanted = -1;
-   s32 coal_dmac_wanted = -1;
s32 coal_pkt_rate_low_wanted = -1;
s32 coal_pkt_rate_high_wanted = -1;
s32 coal_rx_usec_wanted = -1;
@@ -2097,8 +2094,6 @@ static int do_scoalesce(struct cmd_context *ctx)
  _adaptive_tx_coalesce },
{ "sample-interval", CMDL_S32, _sample_rate_wanted,
  _sample_interval },
-   { "dmac", CMDL_S32, _dmac_wanted,
-  },
{ "stats-block-usecs", CMDL_S32, _stats_wanted,
  _block_coalesce_usecs },
{ "pkt-rate-low", CMDL_S32, _pkt_rate_low_wanted,
@@ -4794,8 +4789,7 @@ static const struct option {
  " [rx-frames-high N]\n"
  " [tx-usecs-high N]\n"
  " [tx-frames-high N]\n"
- " [sample-interval N]\n"
- " [dmac N]\n" },
+ " [sample-interval N]\n" },
{ "-g|--show-ring", 1, do_gring, "Query RX/TX ring parameters" },
{ "-G|--set-ring", 1, do_sring, "Set RX/TX ring parameters",
  " [ rx N ]\n"
-- 
2.5.0

[PATCH v3 0/3] ethtool: add ETHTOOL_RESET support via --reset command

2017-12-12 Thread Scott Branden

Patch series to add ETHTOOL_RESET support to ethtool userspace tool.
Include:
- revert custom change to ethtool-copy.h that is not in linux kernel
- sync ethtool-copy.h with ethtool.h in linux kernel net-next
- add ETHTOOL_RESET support with up to date ethtool.h reset defines

Changes from v2:
 - update commit message to indicate support for ap
 - add symbolic support for parsing of -shared added to each component specified
 - cleaned up reset print information to indicate which components have and 
have not been reset

Scott Branden (3):
  Revert "ethtool: Add DMA Coalescing support"
  ethtool-copy.h: sync with net-next
  ethtool: Add ETHTOOL_RESET support via --reset command

 ethtool-copy.h |  68 
 ethtool.8.in   |  68 +++-
 ethtool.c  | 121 +
 3 files changed, 241 insertions(+), 16 deletions(-)

-- 
2.5.0

Re: [PATCHv2 1/3] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-12-12 Thread Andrew Lunn

On Tue, Dec 12, 2017 at 10:45:21PM +0530, jassisinghb...@gmail.com wrote:
> From: Jassi Brar 
> 
> This patch adds documentation for Device-Tree bindings for the
> Socionext NetSec Controller driver.
> 
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Jassi Brar 
> ---
>  .../devicetree/bindings/net/socionext-netsec.txt   | 43 
> ++
>  1 file changed, 43 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt 
> b/Documentation/devicetree/bindings/net/socionext-netsec.txt
> new file mode 100644
> index 000..4695969
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
> @@ -0,0 +1,45 @@
> +* Socionext NetSec Ethernet Controller IP
> +
> +Required properties:
> +- compatible: Should be "socionext,synquacer-netsec"
> +- reg: Address and length of the control register area, followed by the
> +   address and length of the EEPROM holding the MAC address and
> +   microengine firmware
> +- interrupts: Should contain ethernet controller interrupt
> +- clocks: phandle to the PHY reference clock, and any other clocks to be
> +  switched by runtime_pm
> +- clock-names: Required only if more than a single clock is listed in 
> 'clocks'.
> +   The PHY reference clock must be named 'phy_refclk'
> +- phy-mode: See ethernet.txt file in the same directory
> +- phy-handle: phandle to select child phy
> +
> +Optional properties: (See ethernet.txt file in the same directory)
> +- dma-coherent: Boolean property, must only be present if memory
> +  accesses performed by the device are cache coherent
> +- local-mac-address
> +- mac-address
> +- max-speed
> +- max-frame-size
> +
> +Required properties for the child phy:
> +- reg: phy address

Hi Jassi

Just reference phy.txt

> +
> +Example:
> + eth0: netsec@522D {
> + compatible = "socionext,synquacer-netsec";
> + reg = <0 0x522D 0x0 0x1>, <0 0x1000 0x0 0x1>;
> + interrupts = ;
> + clocks = <_netsec>;
> + phy-mode = "rgmii";
> + max-speed = <1000>;
> + max-frame-size = <9000>;
> + phy-handle = <>;
> +
> + #address-cells = <1>;
> + #size-cells = <0>;
> +

Please add an mdio node here, and list all the phys and possibly
Ethernet switches as children of it.

 Andrew

[PATCH net-next] net: avoid skb_warn_bad_offload on IS_ERR

2017-12-12 Thread Willem de Bruijn

From: Willem de Bruijn 

skb_warn_bad_offload warns when packets enter the GSO stack that
require skb_checksum_help or vice versa. Do not warn on arbitrary
bad packets. Packet sockets can craft many. Syzkaller was able to
demonstrate another one with eth_type games.

In particular, suppress the warning when segmentation returns an
error, which is for reasons other than checksum offload.

See also commit 36c92474498a ("net: WARN if skb_checksum_help() is
called on skb requiring segmentation") for context on this warning.

Signed-off-by: Willem de Bruijn 
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8aa2f70995e8..b0eee49a2489 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2803,7 +2803,7 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 
segs = skb_mac_gso_segment(skb, features);
 
-   if (unlikely(skb_needs_check(skb, tx_path)))
+   if (unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
skb_warn_bad_offload(skb);
 
return segs;
-- 
2.15.1.424.g9478a66081-goog

Re: [PATCH v2 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Ivan Khoronzhuk

On Tue, Dec 12, 2017 at 11:08:51AM -0600, Grygorii Strashko wrote:
> 
> 
> On 12/12/2017 10:35 AM, Ivan Khoronzhuk wrote:
> > It's not correct to return NULL when that is actually an error and
> > function returns errors in any other wrong case. In the same time,
> > the cpsw driver and davinci emac doesn't check error case while
> > creating channel and it can miss actual error. Also remove WARNs
> > duplicated dev_err msgs.
> > 
> > Signed-off-by: Ivan Khoronzhuk 
> > ---
> >   drivers/net/ethernet/ti/cpsw.c  | 12 +---
> >   drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
> >   drivers/net/ethernet/ti/davinci_emac.c  |  9 +++--
> >   3 files changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> > index a60a378..3c85a08 100644
> > --- a/drivers/net/ethernet/ti/cpsw.c
> > +++ b/drivers/net/ethernet/ti/cpsw.c
> > @@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
> > }
> >   
> > cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
> > +   if (IS_ERR(cpsw->txv[0].ch)) {
> > +   dev_err(priv->dev, "error initializing tx dma channel\n");
> > +   ret = PTR_ERR(cpsw->txv[0].ch);
> > +   goto clean_dma_ret;
> > +   }
> > +
> > cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
> > -   if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
> > -   dev_err(priv->dev, "error initializing dma channels\n");
> > -   ret = -ENOMEM;
> > +   if (IS_ERR(cpsw->rxv[0].ch)) {
> > +   dev_err(priv->dev, "error initializing rx dma channel\n");
> > +   ret = PTR_ERR(cpsw->rxv[0].ch);
> > goto clean_dma_ret;
> > }
> >   
> > diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
> > b/drivers/net/ethernet/ti/davinci_cpdma.c
> > index e4d6edf..6f9173f 100644
> > --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> > +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> > @@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
> > *ctlr, int chan_num,
> > chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
> >   
> > if (__chan_linear(chan_num) >= ctlr->num_chan)
> > -   return NULL;
> > +   return ERR_PTR(-EINVAL);
> >   
> > chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
> > if (!chan)
> > diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
> > b/drivers/net/ethernet/ti/davinci_emac.c
> > index f58c0c6..3d4af64 100644
> > --- a/drivers/net/ethernet/ti/davinci_emac.c
> > +++ b/drivers/net/ethernet/ti/davinci_emac.c
> > @@ -1870,10 +1870,15 @@ static int davinci_emac_probe(struct 
> > platform_device *pdev)
> >   
> > priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
> >  emac_tx_handler, 0);
> > +   if (WARN_ON(IS_ERR(priv->txchan))) {
> 
> So, logically WARN_ON() should be removed in  davinci_emac.c also. Right?
It doesn't have dev_err() duplicate, so not very.
But would be better to replace them on dev_err() if no objection.


> 
> > +   rc = PTR_ERR(priv->txchan);
> > +   goto no_cpdma_chan;
> > +   }
> > +
> > priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
> >  emac_rx_handler, 1);
> > -   if (WARN_ON(!priv->txchan || !priv->rxchan)) {
> > -   rc = -ENOMEM;
> > +   if (WARN_ON(IS_ERR(priv->rxchan))) {
> > +   rc = PTR_ERR(priv->rxchan);
> > goto no_cpdma_chan;
> > }
> >   
> > 
> 
> -- 
> regards,
> -grygorii

-- 
Regards,
Ivan Khoronzhuk

Re: [PATCH net-next v2] ip6_vti: adjust vti mtu according to mtu of output device

2017-12-12 Thread Shannon Nelson


On 12/12/2017 5:53 AM, Alexey Kodanev wrote:

LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams that
require fragmentation and the underlying device has MTU <= 1500. This
happens because ip6_vti sets mtu to ETH_DATA_LEN and not updating it
depending on a destination address or link parameter.

Further attempts to send UDP packets may succeed because pmtu gets
updated on ICMPV6_PKT_TOOBIG in vti6_err().

Here is the example when the output device MTU is set to 9000:

   # ip a sh ltp_ns_veth2
   ltp_ns_veth2@if7:  mtu 9000 ...
 inet 10.0.0.2/24 scope global ltp_ns_veth2
 inet6 fd00::2/64 scope global

   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
   # ip li show vti6
   vti6@NONE:  mtu 1500 ...
 link/tunnel6 fd00::2 peer fd00::1

After the patch:
   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
   # ip li show vti6
   vti6@NONE:  mtu 8832 ...
 link/tunnel6 fd00::2 peer fd00::1

Regarding ip_vti, it already tunes MTU with ip_tunnel_bind_dev().

Reported-by: Petr Vorel 
Signed-off-by: Alexey Kodanev 
---
v2: * cleanup commit message issues (thanks to Shannon)


Acked-by: Shannon Nelson 



 * handle the case when we don't have route but have device parameter

 * cast new MTU to int and then check the maximum (tdev->mtu can be
   less than dev->hard_header_len)

When changing the tunnel parameters, MTU can be updated as well... should
we also check that parms 'link', 'laddr' or 'raddr' were actually changed
in vti6_tnl_change() and/or IFLA_MTU wasn't set?

  net/ipv6/ip6_vti.c |   22 ++
  1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index dbb74f3..d4624c2 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -626,6 +626,7 @@ static void vti6_link_config(struct ip6_tnl *t)
  {
struct net_device *dev = t->dev;
struct __ip6_tnl_parm *p = >parms;
+   struct net_device *tdev = NULL;
  
  	memcpy(dev->dev_addr, >laddr, sizeof(struct in6_addr));

memcpy(dev->broadcast, >raddr, sizeof(struct in6_addr));
@@ -638,6 +639,27 @@ static void vti6_link_config(struct ip6_tnl *t)
dev->flags |= IFF_POINTOPOINT;
else
dev->flags &= ~IFF_POINTOPOINT;
+
+   if (p->flags & IP6_TNL_F_CAP_XMIT) {
+   int strict = (ipv6_addr_type(>raddr) &
+ (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL));
+
+   struct rt6_info *rt = rt6_lookup(t->net,
+>raddr, >laddr,
+p->link, strict);
+
+   if (rt)
+   tdev = rt->dst.dev;
+   ip6_rt_put(rt);
+   }
+
+   if (!tdev && p->link)
+   tdev = __dev_get_by_index(t->net, p->link);
+
+   if (tdev) {
+   dev->mtu = max_t(int, tdev->mtu - dev->hard_header_len,
+IPV6_MIN_MTU);
+   }
  }
  
  /**

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Marcelo Ricardo Leitner

On Tue, Dec 12, 2017 at 11:32:46AM -0600, Dan Williams wrote:
> On Tue, 2017-12-12 at 08:13 -0800, Ed Swierk wrote:
> > Most physical Ethernet devices pad short packets to the minimum
> > length
> > of 64 bytes (including FCS) on transmit. It can be useful to simulate
> > this behavior when debugging a problem that results from it (such as
> > incorrect L4 checksum calculation).
> > 
> > Padding is unnecessary for most applications so leave it off by
> > default. Enable padding only when the otherwise unused IFF_AUTOMEDIA
> > flag is set (e.g. by writing 0x5003 to flags in sysfs).
> 
> This seems like a weird overload of AUTOMEDIA, which no other driver
> uses for this purpose.  Seems like the only other user of AUTOMEDIA is
> 8390/etherh.c for some 10BaseT/10Base2 stuff.
> 
> I'm not sure what the interface should be, but perhaps a sysfs
> attribute would be better than overloading IFF_AUTOMEDIA?

What about using some tc action (i.e. skbmod) for this?

  Marcelo

> 
> Dan
> 
> > Signed-off-by: Ed Swierk 
> > ---
> >  drivers/net/veth.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index f5438d0978ca..292029bf4bb2 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -111,6 +111,12 @@ static netdev_tx_t veth_xmit(struct sk_buff
> > *skb, struct net_device *dev)
> >     goto drop;
> >     }
> >  
> > +   if (unlikely(dev->flags & IFF_AUTOMEDIA)) {
> > +   /* if eth_skb_pad returns an error the skb was freed
> > */
> > +   if (eth_skb_pad(skb))
> > +   goto drop;
> > +   }
> > +
> >     if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
> >     struct pcpu_vstats *stats = this_cpu_ptr(dev-
> > >vstats);
> >  
>

[PATCH v2 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Ivan Khoronzhuk

It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
duplicated dev_err msgs.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 12 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
 drivers/net/ethernet/ti/davinci_emac.c  |  9 +++--
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a60a378..3c85a08 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
}
 
cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
+   if (IS_ERR(cpsw->txv[0].ch)) {
+   dev_err(priv->dev, "error initializing tx dma channel\n");
+   ret = PTR_ERR(cpsw->txv[0].ch);
+   goto clean_dma_ret;
+   }
+
cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
-   if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
-   dev_err(priv->dev, "error initializing dma channels\n");
-   ret = -ENOMEM;
+   if (IS_ERR(cpsw->rxv[0].ch)) {
+   dev_err(priv->dev, "error initializing rx dma channel\n");
+   ret = PTR_ERR(cpsw->rxv[0].ch);
goto clean_dma_ret;
}
 
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index e4d6edf..6f9173f 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
 
if (__chan_linear(chan_num) >= ctlr->num_chan)
-   return NULL;
+   return ERR_PTR(-EINVAL);
 
chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
if (!chan)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f58c0c6..3d4af64 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1870,10 +1870,15 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
 
priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
 emac_tx_handler, 0);
+   if (WARN_ON(IS_ERR(priv->txchan))) {
+   rc = PTR_ERR(priv->txchan);
+   goto no_cpdma_chan;
+   }
+
priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
 emac_rx_handler, 1);
-   if (WARN_ON(!priv->txchan || !priv->rxchan)) {
-   rc = -ENOMEM;
+   if (WARN_ON(IS_ERR(priv->rxchan))) {
+   rc = PTR_ERR(priv->rxchan);
goto no_cpdma_chan;
}
 
-- 
2.7.4

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Dan Williams

On Tue, 2017-12-12 at 08:13 -0800, Ed Swierk wrote:
> Most physical Ethernet devices pad short packets to the minimum
> length
> of 64 bytes (including FCS) on transmit. It can be useful to simulate
> this behavior when debugging a problem that results from it (such as
> incorrect L4 checksum calculation).
> 
> Padding is unnecessary for most applications so leave it off by
> default. Enable padding only when the otherwise unused IFF_AUTOMEDIA
> flag is set (e.g. by writing 0x5003 to flags in sysfs).

This seems like a weird overload of AUTOMEDIA, which no other driver
uses for this purpose.  Seems like the only other user of AUTOMEDIA is
8390/etherh.c for some 10BaseT/10Base2 stuff.

I'm not sure what the interface should be, but perhaps a sysfs
attribute would be better than overloading IFF_AUTOMEDIA?

Dan

> Signed-off-by: Ed Swierk 
> ---
>  drivers/net/veth.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index f5438d0978ca..292029bf4bb2 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -111,6 +111,12 @@ static netdev_tx_t veth_xmit(struct sk_buff
> *skb, struct net_device *dev)
>   goto drop;
>   }
>  
> + if (unlikely(dev->flags & IFF_AUTOMEDIA)) {
> + /* if eth_skb_pad returns an error the skb was freed
> */
> + if (eth_skb_pad(skb))
> + goto drop;
> + }
> +
>   if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
>   struct pcpu_vstats *stats = this_cpu_ptr(dev-
> >vstats);
>

[PATCH V3 net-next 3/8] net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support

2017-12-12 Thread Salil Mehta

This patch adds the support of hardware compatibiltiy layer to the
HNS3 VF Driver. This layer implements various {set|get} operations
over MAC address for a virtual port, RSS related configuration,
fetches the link status info from PF, does various VLAN related
configuration over the virtual port, queries the statistics from
the hardware etc.

This layer can directly interact with hardware through the
IMP(Integrated Mangement Processor) interface or can use mailbox
to interact with the PF driver.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
  Link: https://lkml.org/lkml/2017/12/8/874
Patch V2: Addressed some internal comments
Patch V1: Initial Submit
---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 1490 
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  164 +++
 2 files changed, 1654 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
new file mode 100644
index 000..ff55f4c
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -0,0 +1,1490 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ */
+
+#include 
+#include "hclgevf_cmd.h"
+#include "hclgevf_main.h"
+#include "hclge_mbx.h"
+#include "hnae3.h"
+
+#define HCLGEVF_NAME   "hclgevf"
+
+static struct hnae3_ae_algo ae_algovf;
+
+static const struct pci_device_id ae_algovf_pci_tbl[] = {
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_VF), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_DCB_PFC_VF), 0},
+   /* required last entry */
+   {0, }
+};
+
+static inline struct hclgevf_dev *hclgevf_ae_get_hdev(
+   struct hnae3_handle *handle)
+{
+   return container_of(handle, struct hclgevf_dev, nic);
+}
+
+static int hclgevf_tqps_update_stats(struct hnae3_handle *handle)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+   struct hnae3_queue *queue;
+   struct hclgevf_desc desc;
+   struct hclgevf_tqp *tqp;
+   int status;
+   int i;
+
+   for (i = 0; i < hdev->num_tqps; i++) {
+   queue = handle->kinfo.tqp[i];
+   tqp = container_of(queue, struct hclgevf_tqp, q);
+   hclgevf_cmd_setup_basic_desc(,
+HCLGEVF_OPC_QUERY_RX_STATUS,
+true);
+
+   desc.data[0] = cpu_to_le32(tqp->index & 0x1ff);
+   status = hclgevf_cmd_send(>hw, , 1);
+   if (status) {
+   dev_err(>pdev->dev,
+   "Query tqp stat fail, status = %d,queue = %d\n",
+   status, i);
+   return status;
+   }
+   tqp->tqp_stats.rcb_rx_ring_pktnum_rcd +=
+   le32_to_cpu(desc.data[4]);
+
+   hclgevf_cmd_setup_basic_desc(, HCLGEVF_OPC_QUERY_TX_STATUS,
+true);
+
+   desc.data[0] = cpu_to_le32(tqp->index & 0x1ff);
+   status = hclgevf_cmd_send(>hw, , 1);
+   if (status) {
+   dev_err(>pdev->dev,
+   "Query tqp stat fail, status = %d,queue = %d\n",
+   status, i);
+   return status;
+   }
+   tqp->tqp_stats.rcb_tx_ring_pktnum_rcd +=
+   le32_to_cpu(desc.data[4]);
+   }
+
+   return 0;
+}
+
+static u64 *hclgevf_tqps_get_stats(struct hnae3_handle *handle, u64 *data)
+{
+   struct hnae3_knic_private_info *kinfo = >kinfo;
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+   struct hclgevf_tqp *tqp;
+   u64 *buff = data;
+   int i;
+
+   for (i = 0; i < hdev->num_tqps; i++) {
+   tqp = container_of(handle->kinfo.tqp[i], struct hclgevf_tqp, q);
+   *buff++ = tqp->tqp_stats.rcb_tx_ring_pktnum_rcd;
+   }
+   for (i = 0; i < kinfo->num_tqps; i++) {
+   tqp = container_of(handle->kinfo.tqp[i], struct hclgevf_tqp, q);
+   *buff++ = tqp->tqp_stats.rcb_rx_ring_pktnum_rcd;
+   }
+
+   return buff;
+}
+
+static int hclgevf_tqps_get_sset_count(struct hnae3_handle *handle, int strset)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+
+   return hdev->num_tqps * 2;
+}
+
+static u8 *hclgevf_tqps_get_strings(struct hnae3_handle *handle, u8 *data)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+   u8 *buff = data;
+   int i = 0;
+
+   for (i = 0; i < hdev->num_tqps; i++) {
+   struct hclgevf_tqp *tqp =

[PATCH net-next] net: dsa: lan9303: Introduce lan9303_read_wait

2017-12-12 Thread Egil Hjelmeland

Simplify lan9303_indirect_phy_wait_for_completion()
and lan9303_switch_wait_for_completion() by using a new function
lan9303_read_wait()

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 59 +++---
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index c1b004fa64d9..96ccce0939d3 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -249,6 +249,29 @@ static int lan9303_read(struct regmap *regmap, unsigned 
int offset, u32 *reg)
return -EIO;
 }
 
+/* Wait a while until mask & reg == value. Otherwise return timeout. */
+static int lan9303_read_wait(struct lan9303 *chip, int offset, int mask,
+char value)
+{
+   int i;
+
+   for (i = 0; i < 25; i++) {
+   u32 reg;
+   int ret;
+
+   ret = lan9303_read(chip->regmap, offset, );
+   if (ret) {
+   dev_err(chip->dev, "%s failed to read offset %d: %d\n",
+   __func__, offset, ret);
+   return ret;
+   }
+   if ((reg & mask) == value)
+   return 0;
+   usleep_range(1000, 2000);
+   }
+   return -ETIMEDOUT;
+}
+
 static int lan9303_virt_phy_reg_read(struct lan9303 *chip, int regnum)
 {
int ret;
@@ -274,22 +297,8 @@ static int lan9303_virt_phy_reg_write(struct lan9303 
*chip, int regnum, u16 val)
 
 static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip)
 {
-   int ret, i;
-   u32 reg;
-
-   for (i = 0; i < 25; i++) {
-   ret = lan9303_read(chip->regmap, LAN9303_PMI_ACCESS, );
-   if (ret) {
-   dev_err(chip->dev,
-   "Failed to read pmi access status: %d\n", ret);
-   return ret;
-   }
-   if (!(reg & LAN9303_PMI_ACCESS_MII_BUSY))
-   return 0;
-   usleep_range(1000, 2000);
-   }
-
-   return -EIO;
+   return lan9303_read_wait(chip, LAN9303_PMI_ACCESS,
+LAN9303_PMI_ACCESS_MII_BUSY, 0);
 }
 
 static int lan9303_indirect_phy_read(struct lan9303 *chip, int addr, int 
regnum)
@@ -366,22 +375,8 @@ EXPORT_SYMBOL_GPL(lan9303_indirect_phy_ops);
 
 static int lan9303_switch_wait_for_completion(struct lan9303 *chip)
 {
-   int ret, i;
-   u32 reg;
-
-   for (i = 0; i < 25; i++) {
-   ret = lan9303_read(chip->regmap, LAN9303_SWITCH_CSR_CMD, );
-   if (ret) {
-   dev_err(chip->dev,
-   "Failed to read csr command status: %d\n", ret);
-   return ret;
-   }
-   if (!(reg & LAN9303_SWITCH_CSR_CMD_BUSY))
-   return 0;
-   usleep_range(1000, 2000);
-   }
-
-   return -EIO;
+   return lan9303_read_wait(chip, LAN9303_SWITCH_CSR_CMD,
+LAN9303_SWITCH_CSR_CMD_BUSY, 0);
 }
 
 static int lan9303_write_switch_reg(struct lan9303 *chip, u16 regnum, u32 val)
-- 
2.14.1

[PATCH V3 net-next 5/8] net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC

2017-12-12 Thread Salil Mehta

Most of the NAPI handling interface, skb buffer management,
management of the RX/TX descriptors, ethool interface etc.
has quite a bit of code which is common to VF and PF driver.

This patch makes the exisitng PF's HNS3 ENET driver as the
common ENET driver for both Virtual & Physical Function. This
will help in reduction of redundancy and better management of
code.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
 drivers/net/ethernet/hisilicon/hns3/Makefile   |  5 +
 drivers/net/ethernet/hisilicon/hns3/hnae3.c| 14 --
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  7 ---
 .../hisilicon/hns3/{hns3pf => }/hns3_dcbnl.c   |  2 +-
 .../hisilicon/hns3/{hns3pf => }/hns3_enet.c|  2 ++
 .../hisilicon/hns3/{hns3pf => }/hns3_enet.h|  0
 .../hisilicon/hns3/{hns3pf => }/hns3_ethtool.c | 22 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/Makefile|  5 -
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  1 +
 9 files changed, 46 insertions(+), 12 deletions(-)
 rename drivers/net/ethernet/hisilicon/hns3/{hns3pf => }/hns3_dcbnl.c (97%)
 rename drivers/net/ethernet/hisilicon/hns3/{hns3pf => }/hns3_enet.c (99%)
 rename drivers/net/ethernet/hisilicon/hns3/{hns3pf => }/hns3_enet.h (100%)
 rename drivers/net/ethernet/hisilicon/hns3/{hns3pf => }/hns3_ethtool.c (97%)

diff --git a/drivers/net/ethernet/hisilicon/hns3/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/Makefile
index c450945..002534f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/Makefile
+++ b/drivers/net/ethernet/hisilicon/hns3/Makefile
@@ -7,3 +7,8 @@ obj-$(CONFIG_HNS3) += hns3pf/
 obj-$(CONFIG_HNS3) += hns3vf/
 
 obj-$(CONFIG_HNS3) += hnae3.o
+
+obj-$(CONFIG_HNS3_ENET) += hns3.o
+hns3-objs = hns3_enet.o hns3_ethtool.o
+
+hns3-$(CONFIG_HNS3_DCB) += hns3_dcbnl.o
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
index 5bcb223..02145f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
@@ -196,9 +196,18 @@ int hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev)
const struct pci_device_id *id;
struct hnae3_ae_algo *ae_algo;
struct hnae3_client *client;
-   int ret = 0;
+   int ret = 0, lock_acquired;
+
+   /* we can get deadlocked if SRIOV is being enabled in context to probe
+* and probe gets called again in same context. This can happen when
+* pci_enable_sriov() is called to create VFs from PF probes context.
+* Therefore, for simplicity uniformly defering further probing in all
+* cases where we detect contention.
+*/
+   lock_acquired = mutex_trylock(_common_lock);
+   if (!lock_acquired)
+   return -EPROBE_DEFER;
 
-   mutex_lock(_common_lock);
list_add_tail(_dev->node, _ae_dev_list);
 
/* Check if there are matched ae_algo */
@@ -211,6 +220,7 @@ int hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev)
 
if (!ae_dev->ops) {
dev_err(_dev->pdev->dev, "ae_dev ops are null\n");
+   ret = -EOPNOTSUPP;
goto out_err;
}
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 67c59e1..a9e2b32 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -452,9 +452,10 @@ struct hnae3_unic_private_info {
struct hnae3_queue **tqp;  /* array base of all TQPs of this instance */
 };
 
-#define HNAE3_SUPPORT_MAC_LOOPBACK1
-#define HNAE3_SUPPORT_PHY_LOOPBACK2
-#define HNAE3_SUPPORT_SERDES_LOOPBACK 4
+#define HNAE3_SUPPORT_MAC_LOOPBACKBIT(0)
+#define HNAE3_SUPPORT_PHY_LOOPBACKBIT(1)
+#define HNAE3_SUPPORT_SERDES_LOOPBACK BIT(2)
+#define HNAE3_SUPPORT_VF BIT(3)
 
 struct hnae3_handle {
struct hnae3_client *client;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_dcbnl.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_dcbnl.c
similarity index 97%
rename from drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_dcbnl.c
rename to drivers/net/ethernet/hisilicon/hns3/hns3_dcbnl.c
index 925619a..eb82700 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_dcbnl.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_dcbnl.c
@@ -93,7 +93,7 @@ void hns3_dcbnl_setup(struct hnae3_handle *handle)
 {
struct net_device *dev = handle->kinfo.netdev;
 
-   if (!handle->kinfo.dcb_ops)
+   if ((!handle->kinfo.dcb_ops) || (handle->flags & HNAE3_SUPPORT_VF))
return;
 
dev->dcbnl_ops = _dcbnl_ops;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
similarity index 99%
rename from drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
rename to

Re: [PATCH v2 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Grygorii Strashko




On 12/12/2017 11:50 AM, Ivan Khoronzhuk wrote:

On Tue, Dec 12, 2017 at 11:08:51AM -0600, Grygorii Strashko wrote:



On 12/12/2017 10:35 AM, Ivan Khoronzhuk wrote:

It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
duplicated dev_err msgs.

Signed-off-by: Ivan Khoronzhuk 
---
   drivers/net/ethernet/ti/cpsw.c  | 12 +---
   drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
   drivers/net/ethernet/ti/davinci_emac.c  |  9 +++--
   3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a60a378..3c85a08 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
}
   
   	cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);

+   if (IS_ERR(cpsw->txv[0].ch)) {
+   dev_err(priv->dev, "error initializing tx dma channel\n");
+   ret = PTR_ERR(cpsw->txv[0].ch);
+   goto clean_dma_ret;
+   }
+
cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
-   if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
-   dev_err(priv->dev, "error initializing dma channels\n");
-   ret = -ENOMEM;
+   if (IS_ERR(cpsw->rxv[0].ch)) {
+   dev_err(priv->dev, "error initializing rx dma channel\n");
+   ret = PTR_ERR(cpsw->rxv[0].ch);
goto clean_dma_ret;
}
   
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c

index e4d6edf..6f9173f 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
   
   	if (__chan_linear(chan_num) >= ctlr->num_chan)

-   return NULL;
+   return ERR_PTR(-EINVAL);
   
   	chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);

if (!chan)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f58c0c6..3d4af64 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1870,10 +1870,15 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
   
   	priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,

 emac_tx_handler, 0);
+   if (WARN_ON(IS_ERR(priv->txchan))) {


So, logically WARN_ON() should be removed in  davinci_emac.c also. Right?

It doesn't have dev_err() duplicate, so not very.
But would be better to replace them on dev_err() if no objection.



right.






+   rc = PTR_ERR(priv->txchan);
+   goto no_cpdma_chan;
+   }
+
priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
 emac_rx_handler, 1);
-   if (WARN_ON(!priv->txchan || !priv->rxchan)) {
-   rc = -ENOMEM;
+   if (WARN_ON(IS_ERR(priv->rxchan))) {
+   rc = PTR_ERR(priv->rxchan);
goto no_cpdma_chan;
}


--
regards,
-grygorii

Re: [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Nikolay Aleksandrov

On 12/12/17 20:07, Stephen Hemminger wrote:
> On Tue, 12 Dec 2017 16:02:50 +0200
> Nikolay Aleksandrov  wrote:
> 
>> Before this patch the bridge used a fixed 256 element hash table which
>> was fine for small use cases (in my tests it starts to degrade
>> above 1000 entries), but it wasn't enough for medium or large
>> scale deployments. Modern setups have thousands of participants in a
>> single bridge, even only enabling vlans and adding a few thousand vlan
>> entries will cause a few thousand fdbs to be automatically inserted per
>> participating port. So we need to scale the fdb table considerably to
>> cope with modern workloads, and this patch converts it to use a
>> rhashtable for its operations thus improving the bridge scalability.
>> Tests show the following results (10 runs each), at up to 1000 entries
>> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
>> is 2 times faster and at 3 it is 50 times faster.
>> Obviously this happens because of the properties of the two constructs
>> and is expected, rhashtable keeps pretty much a constant time even with
>> 1000 entries (tested), while the fixed hash table struggles
>> considerably even above 1.
>> As a side effect this also reduces the net_bridge struct size from 3248
>> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
>>
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
> 
> Thanks for doing this, it was on my list of things that never get done.
> 
> Some downsides:
>  * size of the FDB entry gets larger.

It does not, due to smp alignment of the write-heavy members we had a large
hole between cache line 1 and 2, the new 8 bytes fit perfectly and there are
still bytes left to use.

>  * you lost the ability to salt the hash (and rekey) which is important
>for DDoS attacks

The hash is always salted (property of rhashtable) and in fact is better because
now the salt is generated for each rhashtable separately rather than having 1 
global
salt for all bridge devices.

>  * being slower for small (<10 entries) also matters and is is a common
>use case for containers.

I think they're pretty comparable in speed, the difference is negligible IMO.

Re: [RFC PATCH] reuseport: compute the ehash only if needed

2017-12-12 Thread Craig Gallek

On Tue, Dec 12, 2017 at 8:09 AM, Paolo Abeni  wrote:
> When a reuseport socket group is using a BPF filter to distribute
> the packets among the sockets, we don't need to compute any hash
> value, but the current reuseport_select_sock() requires the
> caller to compute such hash in advance.
>
> This patch reworks reuseport_select_sock() to compute the hash value
> only if needed - missing or failing BPF filter. Since different
> hash functions have different argument types - ipv4 addresses vs ipv6
> ones - to avoid over-complicate the interface, reuseport_select_sock()
> is now a macro.
Purely subjective, but I think a slightly more complicated function
signature for reuseport_select_sock (and reuseport_select_sock6?)
would look a little better than this macro.  It would avoid needing to
expose the reuseport_info struct and would keep the rcu semantics
entirely within the function call (the fast-path memory access
semantics here are already non-trivial...)

> Additionally, the sk_reuseport test is move inside reuseport_select_sock,
> to avoid some code duplication.
>
> Overall this gives small but measurable performance improvement
> under UDP flood while using SO_REUSEPORT + BPF.
Exciting, do you have some specific numbers here?  I'd be interested
in knowing what kinds of loads you end up seeing improvements for.

> Signed-off-by: Paolo Abeni

Re: [PATCH net-next] net: dsa: lan9303: Introduce lan9303_read_wait

2017-12-12 Thread Vivien Didelot

Hi Egil,

Egil Hjelmeland  writes:

> Simplify lan9303_indirect_phy_wait_for_completion()
> and lan9303_switch_wait_for_completion() by using a new function
> lan9303_read_wait()
>
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 59 
> +++---
>  1 file changed, 27 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index c1b004fa64d9..96ccce0939d3 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -249,6 +249,29 @@ static int lan9303_read(struct regmap *regmap, unsigned 
> int offset, u32 *reg)
>   return -EIO;
>  }
>  
> +/* Wait a while until mask & reg == value. Otherwise return timeout. */
> +static int lan9303_read_wait(struct lan9303 *chip, int offset, int mask,
> +  char value)
> +{
> + int i;
> +
> + for (i = 0; i < 25; i++) {
> + u32 reg;
> + int ret;
> +
> + ret = lan9303_read(chip->regmap, offset, );
> + if (ret) {
> + dev_err(chip->dev, "%s failed to read offset %d: %d\n",
> + __func__, offset, ret);
> + return ret;
> + }
> + if ((reg & mask) == value)
> + return 0;

That is weird to mix int, u32 and char for mask checking. I suggest you
to use the u32 type as well for both mask and value.

Looking at how lan9303_read_wait is called, the value argument doesn't
seem necessary. You can directly return 0 if (!(reg & mask)).

> + usleep_range(1000, 2000);
> + }
> + return -ETIMEDOUT;

A newline before the return statment would be appreciated.

> +}
> +
>  static int lan9303_virt_phy_reg_read(struct lan9303 *chip, int regnum)
>  {
>   int ret;
> @@ -274,22 +297,8 @@ static int lan9303_virt_phy_reg_write(struct lan9303 
> *chip, int regnum, u16 val)
>  
>  static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip)
>  {
> - int ret, i;
> - u32 reg;
> -
> - for (i = 0; i < 25; i++) {
> - ret = lan9303_read(chip->regmap, LAN9303_PMI_ACCESS, );
> - if (ret) {
> - dev_err(chip->dev,
> - "Failed to read pmi access status: %d\n", ret);
> - return ret;
> - }
> - if (!(reg & LAN9303_PMI_ACCESS_MII_BUSY))
> - return 0;
> - usleep_range(1000, 2000);
> - }
> -
> - return -EIO;
> + return lan9303_read_wait(chip, LAN9303_PMI_ACCESS,
> +  LAN9303_PMI_ACCESS_MII_BUSY, 0);
>  }
>  
>  static int lan9303_indirect_phy_read(struct lan9303 *chip, int addr, int 
> regnum)
> @@ -366,22 +375,8 @@ EXPORT_SYMBOL_GPL(lan9303_indirect_phy_ops);
>  
>  static int lan9303_switch_wait_for_completion(struct lan9303 *chip)
>  {
> - int ret, i;
> - u32 reg;
> -
> - for (i = 0; i < 25; i++) {
> - ret = lan9303_read(chip->regmap, LAN9303_SWITCH_CSR_CMD, );
> - if (ret) {
> - dev_err(chip->dev,
> - "Failed to read csr command status: %d\n", ret);
> - return ret;
> - }
> - if (!(reg & LAN9303_SWITCH_CSR_CMD_BUSY))
> - return 0;
> - usleep_range(1000, 2000);
> - }
> -
> - return -EIO;
> + return lan9303_read_wait(chip, LAN9303_SWITCH_CSR_CMD,
> +  LAN9303_SWITCH_CSR_CMD_BUSY, 0);
>  }
>  
>  static int lan9303_write_switch_reg(struct lan9303 *chip, u16 regnum, u32 
> val)


Thanks,

Vivien

Re: [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Nikolay Aleksandrov

On 12/12/17 20:02, Stephen Hemminger wrote:
> On Tue, 12 Dec 2017 16:02:50 +0200
> Nikolay Aleksandrov  wrote:
> 
>> +memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);
> 
> Maybe use ether_addr_copy() here?
> 

This is an unrelated cleanup, the code in question was already like that. I can 
post
a separate patch to turn these into ether_addr_copy().

Re: [PATCH 2/4] sctp: Add ip option support

2017-12-12 Thread Marcelo Ricardo Leitner

On Tue, Dec 12, 2017 at 02:08:00PM -0200, Marcelo Ricardo Leitner wrote:
> Hi Richard,
> 
> On Mon, Nov 27, 2017 at 07:31:21PM +, Richard Haines wrote:
> ...
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -3123,8 +3123,10 @@ static int sctp_setsockopt_maxseg(struct sock *sk, 
> > char __user *optval, unsigned
> >  
> > if (asoc) {
> > if (val == 0) {
> > +   struct sctp_af *af = sp->pf->af;
> > val = asoc->pathmtu;
> > -   val -= sp->pf->af->net_header_len;
> > +   val -= af->ip_options_len(asoc->base.sk);
> > +   val -= af->net_header_len;
> > val -= sizeof(struct sctphdr) +
> > sizeof(struct sctp_data_chunk);
> > }
> 
> Right below here there is a call to sctp_frag_point(). That function
> also needs this tweak.
> 
> Yes, we should simplify all these calculations. I have a patch to use
> sctp_frag_point on where it is currently recalculating it on
> sctp_datamsg_from_user(), but probably should include other places as
> well.

I have no further comments on this patchset other than the above and
LGTM.
Thanks Richard.

  Marcelo

Re: [RFC PATCH] reuseport: compute the ehash only if needed

2017-12-12 Thread Paolo Abeni

Hi,
On Tue, 2017-12-12 at 12:44 -0500, Craig Gallek wrote:
> On Tue, Dec 12, 2017 at 8:09 AM, Paolo Abeni  wrote:
> > When a reuseport socket group is using a BPF filter to distribute
> > the packets among the sockets, we don't need to compute any hash
> > value, but the current reuseport_select_sock() requires the
> > caller to compute such hash in advance.
> > 
> > This patch reworks reuseport_select_sock() to compute the hash value
> > only if needed - missing or failing BPF filter. Since different
> > hash functions have different argument types - ipv4 addresses vs ipv6
> > ones - to avoid over-complicate the interface, reuseport_select_sock()
> > is now a macro.
> 
> Purely subjective, but I think a slightly more complicated function
> signature for reuseport_select_sock (and reuseport_select_sock6?)
> would look a little better than this macro.  It would avoid needing to
> expose the reuseport_info struct and would keep the rcu semantics
> entirely within the function call (the fast-path memory access
> semantics here are already non-trivial...)

Thanks for the feedback. 

I was in doubt about the macro, too. The downside of using explicit
functions is the very long argument list and the need of 2 separate
functions for ipv4 and ipv6.

> > Additionally, the sk_reuseport test is move inside reuseport_select_sock,
> > to avoid some code duplication.
> > 
> > Overall this gives small but measurable performance improvement
> > under UDP flood while using SO_REUSEPORT + BPF.
> 
> Exciting, do you have some specific numbers here?  I'd be interested
> in knowing what kinds of loads you end up seeing improvements for.

this are the numbers I collected so far:

(ipv4)
socks nrvanilla(kpps)   patched(kpps)
1   17471843
2   31093140
3   44804534
4   57965864
5   70637139
6   81688235

(ipv6)
socks nrvanilla(kpps)   patched(kpps)
1   14331544
2   25372731
3   36223794
4   46894979
5   57386011
6   66716920

Cheers,

Paolo

Re: [PATCH net-next v5 1/2] net: add support for Cavium PTP coprocessor

2017-12-12 Thread Richard Cochran

On Tue, Dec 12, 2017 at 12:41:35PM +0300, Aleksey Makarov wrote:
> If ptp_clock_register() returns NULL, the device is still paired with the 
> driver,
> but the driver is not registered in the PTP core.  When ethernet driver needs
> the reference to this cavium PTP driver, it calls cavium_ptp_get() that checks
> if ptp->ptp_clock is NULL and, if so, returns -ENODEV.

The pointer clock->ptp_clock can be NULL.

Yet you de-reference it here:

> +static void cavium_ptp_remove(struct pci_dev *pdev)
> +{
> + struct cavium_ptp *clock = pci_get_drvdata(pdev);
> + u64 clock_cfg;
> +
> + pci_set_drvdata(pdev, NULL);
> +
> + ptp_clock_unregister(clock->ptp_clock);
> +
> + clock_cfg = readq(clock->reg_base + PTP_CLOCK_CFG);
> + clock_cfg &= ~PTP_CLOCK_CFG_PTP_EN;
> + writeq(clock_cfg, clock->reg_base + PTP_CLOCK_CFG);
> +}

and here:

> +static inline int cavium_ptp_clock_index(struct cavium_ptp *clock)
> +{
> + return ptp_clock_index(clock->ptp_clock);
> +}

That needs to be fixed.

Thanks,
Richard

[PATCHv2 0/3] Socionext Synquacer NETSEC driver

2017-12-12 Thread jassisinghbrar

From: Jassi Brar 

Hi,

Changes since v1
# Switched from using memremap to ioremap
# Implemented ndo_do_ioctl callback
# Defined optional 'dma-coherent' DT property

Jassi Brar (3):
  dt-bindings: net: Add DT bindings for Socionext Netsec
  net: socionext: Add Synquacer NetSec driver
  MAINTAINERS: Add entry for Socionext ethernet driver

 .../devicetree/bindings/net/socionext-netsec.txt   |   43 +
 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   29 +
 drivers/net/ethernet/socionext/Makefile|1 +
 drivers/net/ethernet/socionext/netsec.c| 1826 
 7 files changed, 1908 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/netsec.c

-- 
2.7.4

Re: [PATCHv2 1/3] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-12-12 Thread Mark Rutland

Hi,

On Tue, Dec 12, 2017 at 10:45:21PM +0530, jassisinghb...@gmail.com wrote:
> From: Jassi Brar 
> 
> This patch adds documentation for Device-Tree bindings for the
> Socionext NetSec Controller driver.
> 
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Jassi Brar 
> ---
>  .../devicetree/bindings/net/socionext-netsec.txt   | 43 
> ++
>  1 file changed, 43 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt 
> b/Documentation/devicetree/bindings/net/socionext-netsec.txt
> new file mode 100644
> index 000..4695969
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
> @@ -0,0 +1,45 @@
> +* Socionext NetSec Ethernet Controller IP
> +
> +Required properties:
> +- compatible: Should be "socionext,synquacer-netsec"
> +- reg: Address and length of the control register area, followed by the
> +   address and length of the EEPROM holding the MAC address and
> +   microengine firmware
> +- interrupts: Should contain ethernet controller interrupt
> +- clocks: phandle to the PHY reference clock, and any other clocks to be
> +  switched by runtime_pm
> +- clock-names: Required only if more than a single clock is listed in 
> 'clocks'.
> +   The PHY reference clock must be named 'phy_refclk'

Please define the full set of clocks (and their names) explicitly. This
should be well-known.

Otherwise, this looks ok.

Thanks,
Mark.

> +- phy-mode: See ethernet.txt file in the same directory
> +- phy-handle: phandle to select child phy
> +
> +Optional properties: (See ethernet.txt file in the same directory)
> +- dma-coherent: Boolean property, must only be present if memory
> +  accesses performed by the device are cache coherent
> +- local-mac-address
> +- mac-address
> +- max-speed
> +- max-frame-size
> +
> +Required properties for the child phy:
> +- reg: phy address
> +
> +Example:
> + eth0: netsec@522D {
> + compatible = "socionext,synquacer-netsec";
> + reg = <0 0x522D 0x0 0x1>, <0 0x1000 0x0 0x1>;
> + interrupts = ;
> + clocks = <_netsec>;
> + phy-mode = "rgmii";
> + max-speed = <1000>;
> + max-frame-size = <9000>;
> + phy-handle = <>;
> +
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + ethphy0: ethernet-phy@1 {
> + compatible = "ethernet-phy-ieee802.3-c22";
> + reg = <1>;
> + };
> + };
> -- 
> 2.7.4
>

Re: [PATCH v3 31/33] dt-bindings: interrupt-controller: Andestech Internal Vector Interrupt Controller

2017-12-12 Thread Rob Herring

On Fri, Dec 08, 2017 at 05:12:14PM +0800, Greentime Hu wrote:
> From: Greentime Hu 
> 
> This patch adds an irqchip driver document for the Andestech Internal Vector
> Interrupt Controller.
> 
> Signed-off-by: Rick Chen 
> Signed-off-by: Greentime Hu 
> ---
>  .../interrupt-controller/andestech,ativic32.txt|   19 +++
>  1 file changed, 19 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/interrupt-controller/andestech,ativic32.txt

I acked v2. Please add acks when posting new versions.

Rob

Re: [PATCH v9 0/5] Add the ability to do BPF directed error injection

2017-12-12 Thread Alexei Starovoitov


On 12/11/17 8:36 AM, Josef Bacik wrote:

This is the same as v8, just rebased onto the bpf tree.

v8->v9:
- rebased onto the bpf tree.

v7->v8:
- removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.

v6->v7:
- moved the opt-in macro to bpf.h out of kprobes.h.

v5->v6:
- add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
  feature.  This way only functions that opt-in will be allowed to be
  overridden.
- added a btrfs patch to allow error injection for open_ctree() so that the bpf
  sample actually works.

v4->v5:
- disallow kprobe_override programs from being put in the prog map array so we
  don't tail call into something we didn't check.  This allows us to make the
  normal path still fast without a bunch of percpu operations.

v3->v4:
- fix a build error found by kbuild test bot (I didn't wait long enough
  apparently.)
- Added a warning message as per Daniels suggestion.

v2->v3:
- added a ->kprobe_override flag to bpf_prog.
- added some sanity checks to disallow attaching bpf progs that have
  ->kprobe_override set that aren't for ftrace kprobes.
- added the trace_kprobe_ftrace helper to check if the trace_event_call is a
  ftrace kprobe.
- renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
  value in the kprobe path, and thus only write to it if we're overriding or
  clearing the override.

v1->v2:
- moved things around to make sure that bpf_override_return could really only be
  used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
  it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.

- Original message -

A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,


Applied, thanks Josef!

While applying in the patch "bpf: add a bpf_override_function helper"
I moved ifdef CONFIG_BPF_KPROBE_OVERRIDE few lines,
so when it's not set the program will fail at load time with error
"unknown func bpf_override_return#58"
instead of returning EINVAL at run-time.
That's more standard way of adding new helpers.

Thanks

[PATCH V3 net-next 6/8] net: hns3: Add mailbox support to PF driver

2017-12-12 Thread Salil Mehta

Command queue provides the provision of Mailbox command which
can be used for communication between PF and VF. PF handles
messages from various VFs for fetching various information like,
queue, vlan, link status related etc. It also handles the request
from various VFs to perform certain privileged operations.

This patch adds the support of a message handler for handling
such various command requests from VF.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
  Link: https://lkml.org/lkml/2017/12/8/874
Patch V2: No Change
Patch V1: Initial Submit
---
 .../net/ethernet/hisilicon/hns3/hns3pf/Makefile|   3 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|   1 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   2 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 306 +
 4 files changed, 311 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
index d077fa0..cb8ddd0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
@@ -1,3 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0+
 #
 # Makefile for the HISILICON network device drivers.
 #
@@ -5,6 +6,6 @@
 ccflags-y := -Idrivers/net/ethernet/hisilicon/hns3
 
 obj-$(CONFIG_HNS3_HCLGE) += hclge.o
-hclge-objs = hclge_main.o hclge_cmd.o hclge_mdio.o hclge_tm.o
+hclge-objs = hclge_main.o hclge_cmd.o hclge_mdio.o hclge_tm.o hclge_mbx.o
 
 hclge-$(CONFIG_HNS3_DCB) += hclge_dcb.o
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index d07c700..980fcdf 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -21,6 +21,7 @@
 #include "hclge_cmd.h"
 #include "hclge_dcb.h"
 #include "hclge_main.h"
+#include "hclge_mbx.h"
 #include "hclge_mdio.h"
 #include "hclge_tm.h"
 #include "hnae3.h"
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index aacec43..028817c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -554,4 +554,6 @@ int hclge_set_vf_vlan_common(struct hclge_dev *vport, int 
vfid,
 
 int hclge_buffer_alloc(struct hclge_dev *hdev);
 int hclge_rss_init_hw(struct hclge_dev *hdev);
+
+void hclge_mbx_handler(struct hclge_dev *hdev);
 #endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
new file mode 100644
index 000..5eb8fff
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ */
+
+#include "hclge_main.h"
+#include "hclge_mbx.h"
+#include "hnae3.h"
+
+/* hclge_gen_resp_to_vf: used to generate a synchronous response to VF when PF
+ * receives a mailbox message from VF.
+ * @vport: pointer to struct hclge_vport
+ * @vf_to_pf_req: pointer to hclge_mbx_vf_to_pf_cmd of the original mailbox
+ *   message
+ * @resp_status: indicate to VF whether its request success(0) or failed.
+ */
+static int hclge_gen_resp_to_vf(struct hclge_vport *vport,
+   struct hclge_mbx_vf_to_pf_cmd *vf_to_pf_req,
+   int resp_status,
+   u8 *resp_data, u16 resp_data_len)
+{
+   struct hclge_mbx_pf_to_vf_cmd *resp_pf_to_vf;
+   struct hclge_dev *hdev = vport->back;
+   enum hclge_cmd_status status;
+   struct hclge_desc desc;
+
+   resp_pf_to_vf = (struct hclge_mbx_pf_to_vf_cmd *)desc.data;
+
+   if (resp_data_len > HCLGE_MBX_MAX_RESP_DATA_SIZE) {
+   dev_err(>pdev->dev,
+   "PF fail to gen resp to VF len %d exceeds max len %d\n",
+   resp_data_len,
+   HCLGE_MBX_MAX_RESP_DATA_SIZE);
+   }
+
+   hclge_cmd_setup_basic_desc(, HCLGEVF_OPC_MBX_PF_TO_VF, false);
+
+   resp_pf_to_vf->dest_vfid = vf_to_pf_req->mbx_src_vfid;
+   resp_pf_to_vf->msg_len = vf_to_pf_req->msg_len;
+
+   resp_pf_to_vf->msg[0] = HCLGE_MBX_PF_VF_RESP;
+   resp_pf_to_vf->msg[1] = vf_to_pf_req->msg[0];
+   resp_pf_to_vf->msg[2] = vf_to_pf_req->msg[1];
+   resp_pf_to_vf->msg[3] = (resp_status == 0) ? 0 : 1;
+
+   if (resp_data && resp_data_len > 0)
+   memcpy(_pf_to_vf->msg[4], resp_data, resp_data_len);
+
+   status = hclge_cmd_send(>hw, , 1);
+   if (status)
+   dev_err(>pdev->dev,
+   "PF failed(=%d) to send response to VF\n", status);
+
+   return

[PATCH V3 net-next 2/8] net: hns3: Add mailbox support to VF driver

2017-12-12 Thread Salil Mehta

This patch adds the support of the mailbox to the VF driver. The
mailbox shall be used as an interface to communicate with the
PF driver for various purposes like {set|get} MAC related
operations, reset, link status etc. The mailbox supports both
synchronous and asynchronous command send to PF driver.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
  Link: https://lkml.org/lkml/2017/12/8/874
Patch V2: Addressed some internal comments
Patch V1: Initial Submit
---
 drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h|  88 ++
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c   | 184 +
 2 files changed, 272 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c

diff --git a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h 
b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
new file mode 100644
index 000..3e9203e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (c) 2016-2017 Hisilicon Limited. */
+
+#ifndef __HCLGE_MBX_H
+#define __HCLGE_MBX_H
+#include 
+#include 
+#include 
+
+#define HCLGE_MBX_VF_MSG_DATA_NUM  16
+
+enum HCLGE_MBX_OPCODE {
+   HCLGE_MBX_RESET = 0x01, /* (VF -> PF) assert reset */
+   HCLGE_MBX_SET_UNICAST,  /* (VF -> PF) set UC addr */
+   HCLGE_MBX_SET_MULTICAST,/* (VF -> PF) set MC addr */
+   HCLGE_MBX_SET_VLAN, /* (VF -> PF) set VLAN */
+   HCLGE_MBX_MAP_RING_TO_VECTOR,   /* (VF -> PF) map ring-to-vector */
+   HCLGE_MBX_UNMAP_RING_TO_VECTOR, /* (VF -> PF) unamp ring-to-vector */
+   HCLGE_MBX_SET_PROMISC_MODE, /* (VF -> PF) set promiscuous mode */
+   HCLGE_MBX_SET_MACVLAN,  /* (VF -> PF) set unicast filter */
+   HCLGE_MBX_API_NEGOTIATE,/* (VF -> PF) negotiate API version */
+   HCLGE_MBX_GET_QINFO,/* (VF -> PF) get queue config */
+   HCLGE_MBX_GET_TCINFO,   /* (VF -> PF) get TC config */
+   HCLGE_MBX_GET_RETA, /* (VF -> PF) get RETA */
+   HCLGE_MBX_GET_RSS_KEY,  /* (VF -> PF) get RSS key */
+   HCLGE_MBX_GET_MAC_ADDR, /* (VF -> PF) get MAC addr */
+   HCLGE_MBX_PF_VF_RESP,   /* (PF -> VF) generate respone to VF */
+   HCLGE_MBX_GET_BDNUM,/* (VF -> PF) get BD num */
+   HCLGE_MBX_GET_BUFSIZE,  /* (VF -> PF) get buffer size */
+   HCLGE_MBX_GET_STREAMID, /* (VF -> PF) get stream id */
+   HCLGE_MBX_SET_AESTART,  /* (VF -> PF) start ae */
+   HCLGE_MBX_SET_TSOSTATS, /* (VF -> PF) get tso stats */
+   HCLGE_MBX_LINK_STAT_CHANGE, /* (PF -> VF) link status has changed */
+   HCLGE_MBX_GET_BASE_CONFIG,  /* (VF -> PF) get config */
+   HCLGE_MBX_BIND_FUNC_QUEUE,  /* (VF -> PF) bind function and queue */
+   HCLGE_MBX_GET_LINK_STATUS,  /* (VF -> PF) get link status */
+   HCLGE_MBX_QUEUE_RESET,  /* (VF -> PF) reset queue */
+};
+
+/* below are per-VF mac-vlan subcodes */
+enum hclge_mbx_mac_vlan_subcode {
+   HCLGE_MBX_MAC_VLAN_UC_MODIFY = 0,   /* modify UC mac addr */
+   HCLGE_MBX_MAC_VLAN_UC_ADD,  /* add a new UC mac addr */
+   HCLGE_MBX_MAC_VLAN_UC_REMOVE,   /* remove a new UC mac addr */
+   HCLGE_MBX_MAC_VLAN_MC_MODIFY,   /* modify MC mac addr */
+   HCLGE_MBX_MAC_VLAN_MC_ADD,  /* add new MC mac addr */
+   HCLGE_MBX_MAC_VLAN_MC_REMOVE,   /* remove MC mac addr */
+   HCLGE_MBX_MAC_VLAN_MC_FUNC_MTA_ENABLE,  /* config func MTA enable */
+};
+
+/* below are per-VF vlan cfg subcodes */
+enum hclge_mbx_vlan_cfg_subcode {
+   HCLGE_MBX_VLAN_FILTER = 0,  /* set vlan filter */
+   HCLGE_MBX_VLAN_TX_OFF_CFG,  /* set tx side vlan offload */
+   HCLGE_MBX_VLAN_RX_OFF_CFG,  /* set rx side vlan offload */
+};
+
+#define HCLGE_MBX_MAX_MSG_SIZE 16
+#define HCLGE_MBX_MAX_RESP_DATA_SIZE   8
+
+struct hclgevf_mbx_resp_status {
+   struct mutex mbx_mutex; /* protects against contending sync cmd resp */
+   u32 origin_mbx_msg;
+   bool received_resp;
+   int resp_status;
+   u8 additional_info[HCLGE_MBX_MAX_RESP_DATA_SIZE];
+};
+
+struct hclge_mbx_vf_to_pf_cmd {
+   u8 rsv;
+   u8 mbx_src_vfid; /* Auto filled by IMP */
+   u8 rsv1[2];
+   u8 msg_len;
+   u8 rsv2[3];
+   u8 msg[HCLGE_MBX_MAX_MSG_SIZE];
+};
+
+struct hclge_mbx_pf_to_vf_cmd {
+   u8 dest_vfid;
+   u8 rsv[3];
+   u8 msg_len;
+   u8 rsv1[3];
+   u16 msg[8];
+};
+
+#define hclge_mbx_ring_ptr_move_crq(crq) \
+   (crq->next_to_use = (crq->next_to_use + 1) % crq->desc_num)
+#endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c

[PATCH V3 net-next 4/8] net: hns3: Add HNS3 VF driver to kernel build framework

2017-12-12 Thread Salil Mehta

This patch introduces the new Makefiles and updates existing
Makefiles required to build the HNS3 Virtual Function driver.
This also updates the Kconfig for introduction of new menuconfig
entries related to VF driver.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
  Link: https://lkml.org/lkml/2017/12/8/874
Patch V2: No change
Patch V1: Initial Submit
---
 drivers/net/ethernet/hisilicon/Kconfig | 28 +++---
 drivers/net/ethernet/hisilicon/hns3/Makefile   |  2 ++
 .../net/ethernet/hisilicon/hns3/hns3vf/Makefile|  9 +++
 3 files changed, 30 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/Makefile

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index 3b6..8bcf470 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -94,15 +94,6 @@ config HNS3_HCLGE
  compatibility layer. The engine would be used in Hisilicon hip08 
family of
  SoCs and further upcoming SoCs.
 
-config HNS3_ENET
-   tristate "Hisilicon HNS3 Ethernet Device Support"
-   depends on 64BIT && PCI
-   depends on HNS3 && HNS3_HCLGE
-   ---help---
- This selects the Ethernet Driver for Hisilicon Network Subsystem 3 
for hip08
- family of SoCs. This module depends upon HNAE3 driver to access the 
HNAE3
- devices and their associated operations.
-
 config HNS3_DCB
bool "Hisilicon HNS3 Data Center Bridge Support"
default n
@@ -112,4 +103,23 @@ config HNS3_DCB
 
  If unsure, say N.
 
+config HNS3_HCLGEVF
+tristate "Hisilicon HNS3VF Acceleration Engine & Compatibility Layer 
Support"
+depends on PCI_MSI
+depends on HNS3
+   depends on HNS3_HCLGE
+---help---
+ This selects the HNS3 VF drivers network acceleration engine & its 
hardware
+ compatibility layer. The engine would be used in Hisilicon hip08 
family of
+ SoCs and further upcoming SoCs.
+
+config HNS3_ENET
+   tristate "Hisilicon HNS3 Ethernet Device Support"
+   depends on 64BIT && PCI
+   depends on HNS3
+   ---help---
+ This selects the Ethernet Driver for Hisilicon Network Subsystem 3 
for hip08
+ family of SoCs. This module depends upon HNAE3 driver to access the 
HNAE3
+ devices and their associated operations.
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/hns3/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/Makefile
index a9349e1..c450945 100644
--- a/drivers/net/ethernet/hisilicon/hns3/Makefile
+++ b/drivers/net/ethernet/hisilicon/hns3/Makefile
@@ -1,7 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0+
 #
 # Makefile for the HISILICON network device drivers.
 #
 
 obj-$(CONFIG_HNS3) += hns3pf/
+obj-$(CONFIG_HNS3) += hns3vf/
 
 obj-$(CONFIG_HNS3) += hnae3.o
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/Makefile
new file mode 100644
index 000..fb93bbd
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0+
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+ccflags-y := -Idrivers/net/ethernet/hisilicon/hns3
+
+obj-$(CONFIG_HNS3_HCLGEVF) += hclgevf.o
+hclgevf-objs = hclgevf_main.o hclgevf_cmd.o hclgevf_mbx.o
\ No newline at end of file
-- 
2.7.4

Re: [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> Before this patch the bridge used a fixed 256 element hash table which
> was fine for small use cases (in my tests it starts to degrade
> above 1000 entries), but it wasn't enough for medium or large
> scale deployments. Modern setups have thousands of participants in a
> single bridge, even only enabling vlans and adding a few thousand vlan
> entries will cause a few thousand fdbs to be automatically inserted per
> participating port. So we need to scale the fdb table considerably to
> cope with modern workloads, and this patch converts it to use a
> rhashtable for its operations thus improving the bridge scalability.
> Tests show the following results (10 runs each), at up to 1000 entries
> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
> is 2 times faster and at 3 it is 50 times faster.
> Obviously this happens because of the properties of the two constructs
> and is expected, rhashtable keeps pretty much a constant time even with
> 1000 entries (tested), while the fixed hash table struggles
> considerably even above 1.
> As a side effect this also reduces the net_bridge struct size from 3248
> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---

Thanks for doing this, it was on my list of things that never get done.

Some downsides:
 * size of the FDB entry gets larger.
 * you lost the ability to salt the hash (and rekey) which is important
   for DDoS attacks
 * being slower for small (<10 entries) also matters and is is a common
   use case for containers.

Re: [PATCH bpf 0/3] Misc BPF fixes

2017-12-12 Thread Alexei Starovoitov

On Tue, Dec 12, 2017 at 02:25:29AM +0100, Daniel Borkmann wrote:
> Couple of outstanding fixes for BPF tree: 1) fixes a perf RB
> corruption, 2) and 3) fixes a few build issues from the recent
> bpf_perf_event.h uapi corrections. Thanks!

Applied, thanks Daniel!

Re: [BUG] skge: a possible sleep-in-atomic bug in skge_remove

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 08:34:45 -0500 (EST)
David Miller  wrote:

> From: Jia-Ju Bai 
> Date: Tue, 12 Dec 2017 16:38:12 +0800
> 
> > According to drivers/net/ethernet/marvell/skge.c, the driver may sleep
> > under a spinlock.
> > The function call path is:
> > skge_remove (acquire the spinlock)
> >   free_irq --> may sleep
> > 
> > I do not find a good way to fix it, so I only report.
> > This possible bug is found by my static analysis tool (DSAC) and
> > checked by my code review.  
> 
> This was added by:
> 
> commit a9e9fd7182332d0cf5f3e601df3e71dd431b70d7
> Author: Stephen Hemminger 
> Date:   Tue Sep 27 13:41:37 2011 -0400
> 
> skge: handle irq better on single port card
> 
> I think the free_irq() can be moved below the unlock.
> 
> Stephen, please take a look.

The IRQ was being free twice.
How did you see it, I really doubt any multi-port SKGE cards
still exist.

Re: [PATCH] drivers/staging/irda: fix max dup length for kstrndup

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:54:44 +0800
Ma Shimiao  wrote:

> If source string longer than max, kstrndup will alloc max+1 space.
> So, we should make sure the result will not over limit.
> 
> Signed-off-by: Ma Shimiao 

Did you read the TODO file in drivers/staging/irda?

The irda code will be removed soon from the kernel tree as it is old and
obsolete and broken.

Don't worry about fixing up anything here, it's not needed.

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:18:17 -0200
Marcelo Ricardo Leitner  wrote:

> On Tue, Dec 12, 2017 at 11:32:46AM -0600, Dan Williams wrote:
> > On Tue, 2017-12-12 at 08:13 -0800, Ed Swierk wrote:  
> > > Most physical Ethernet devices pad short packets to the minimum
> > > length
> > > of 64 bytes (including FCS) on transmit. It can be useful to simulate
> > > this behavior when debugging a problem that results from it (such as
> > > incorrect L4 checksum calculation).
> > > 
> > > Padding is unnecessary for most applications so leave it off by
> > > default. Enable padding only when the otherwise unused IFF_AUTOMEDIA
> > > flag is set (e.g. by writing 0x5003 to flags in sysfs).  
> > 
> > This seems like a weird overload of AUTOMEDIA, which no other driver
> > uses for this purpose.  Seems like the only other user of AUTOMEDIA is
> > 8390/etherh.c for some 10BaseT/10Base2 stuff.
> > 
> > I'm not sure what the interface should be, but perhaps a sysfs
> > attribute would be better than overloading IFF_AUTOMEDIA?  
> 
> What about using some tc action (i.e. skbmod) for this?
> 
>   Marcelo

Why not add to netdevsim rather than cluttering up a normal driver
with test support.  We just pulled a bunch of test stuff out of dummy
for the same reason.

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Ed Swierk

On Tue, Dec 12, 2017 at 10:34 AM, Stephen Hemminger
 wrote:
> Why not add to netdevsim rather than cluttering up a normal driver
> with test support.  We just pulled a bunch of test stuff out of dummy
> for the same reason.

My test setup to trigger an openvswitch conntrack issue
(https://marc.info/?l=linux-netdev=151309548725627) involves a lot
of moving parts:

[netns-a: vetha1] - [vetha0] - [ovsbr0] - [vethb0] - [netns-b: vethb1]

with nc client and server in netns-a and -b, and tweaks like turning
off tcp_timestamps to make sure the packets in the TCP stream are
small enough to reproduce the problem. A simpler, less fragile test
setup would be valuable, especially if it ends up as an automated
regression test.

Could netdevsim be useful for that? Are there any existing tests
producing TCP traffic that might serve as an example?

--Ed

Re: [PATCH net-next v4 1/2] bpf/tracing: allow user space to query prog array on the same tp

2017-12-12 Thread Alexei Starovoitov


On 12/12/17 1:03 AM, Peter Zijlstra wrote:

On Mon, Dec 11, 2017 at 11:39:02AM -0800, Yonghong Song wrote:

The usage:
  struct perf_event_query_bpf *query = malloc(...);
  query.ids_len = ids_len;
  err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, );


You didn't spot the fixes to your changelog ;-) The above should read
something like:

struct perf_event_query_bpf *query =
malloc(sizeof(*query) + sizeof(u32) * ids_len);
query->ids_len = ids_len;
err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);


sure. I fixed up this nit in commit log of patch 1 and in test_progs.c
of patch 2.

Re: [PATCH v2 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Grygorii Strashko



On 12/12/2017 10:35 AM, Ivan Khoronzhuk wrote:
> It's not correct to return NULL when that is actually an error and
> function returns errors in any other wrong case. In the same time,
> the cpsw driver and davinci emac doesn't check error case while
> creating channel and it can miss actual error. Also remove WARNs
> duplicated dev_err msgs.
> 
> Signed-off-by: Ivan Khoronzhuk 
> ---
>   drivers/net/ethernet/ti/cpsw.c  | 12 +---
>   drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
>   drivers/net/ethernet/ti/davinci_emac.c  |  9 +++--
>   3 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index a60a378..3c85a08 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
>   }
>   
>   cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
> + if (IS_ERR(cpsw->txv[0].ch)) {
> + dev_err(priv->dev, "error initializing tx dma channel\n");
> + ret = PTR_ERR(cpsw->txv[0].ch);
> + goto clean_dma_ret;
> + }
> +
>   cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
> - if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
> - dev_err(priv->dev, "error initializing dma channels\n");
> - ret = -ENOMEM;
> + if (IS_ERR(cpsw->rxv[0].ch)) {
> + dev_err(priv->dev, "error initializing rx dma channel\n");
> + ret = PTR_ERR(cpsw->rxv[0].ch);
>   goto clean_dma_ret;
>   }
>   
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
> b/drivers/net/ethernet/ti/davinci_cpdma.c
> index e4d6edf..6f9173f 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
> *ctlr, int chan_num,
>   chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
>   
>   if (__chan_linear(chan_num) >= ctlr->num_chan)
> - return NULL;
> + return ERR_PTR(-EINVAL);
>   
>   chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
>   if (!chan)
> diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
> b/drivers/net/ethernet/ti/davinci_emac.c
> index f58c0c6..3d4af64 100644
> --- a/drivers/net/ethernet/ti/davinci_emac.c
> +++ b/drivers/net/ethernet/ti/davinci_emac.c
> @@ -1870,10 +1870,15 @@ static int davinci_emac_probe(struct platform_device 
> *pdev)
>   
>   priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
>emac_tx_handler, 0);
> + if (WARN_ON(IS_ERR(priv->txchan))) {

So, logically WARN_ON() should be removed in  davinci_emac.c also. Right?

> + rc = PTR_ERR(priv->txchan);
> + goto no_cpdma_chan;
> + }
> +
>   priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
>emac_rx_handler, 1);
> - if (WARN_ON(!priv->txchan || !priv->rxchan)) {
> - rc = -ENOMEM;
> + if (WARN_ON(IS_ERR(priv->rxchan))) {
> + rc = PTR_ERR(priv->rxchan);
>   goto no_cpdma_chan;
>   }
>   
> 

-- 
regards,
-grygorii

[PATCHv2 3/3] MAINTAINERS: Add entry for Socionext ethernet driver

2017-12-12 Thread jassisinghbrar

From: Jassi Brar 

Add entry for the Socionext Netsec controller driver and DT bindings.

Acked-by: Ard Biesheuvel 
Signed-off-by: Jassi Brar 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e0045e..0e1f0d4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12630,6 +12630,13 @@ F: drivers/md/raid*
 F: include/linux/raid/
 F: include/uapi/linux/raid/
 
+SOCIONEXT (SNI) NETSEC NETWORK DRIVER
+M: Jassi Brar 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/ethernet/socionext/netsec.c
+F: Documentation/devicetree/bindings/net/socionext-netsec.txt
+
 SONIC NETWORK DRIVER
 M: Thomas Bogendoerfer 
 L: netdev@vger.kernel.org
-- 
2.7.4

Re: [PATCHv2 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-12 Thread Ard Biesheuvel

Hi Jassi,


On 12 December 2017 at 17:15,   wrote:
> From: Jassi Brar 
>
> This driver adds support for Socionext "netsec" IP Gigabit
> Ethernet + PHY IP used in the Synquacer SC2A11 SoC.
>
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Jassi Brar 
> ---
>  drivers/net/ethernet/Kconfig|1 +
>  drivers/net/ethernet/Makefile   |1 +
>  drivers/net/ethernet/socionext/Kconfig  |   29 +
>  drivers/net/ethernet/socionext/Makefile |1 +
>  drivers/net/ethernet/socionext/netsec.c | 1826 
> +++
>  5 files changed, 1858 insertions(+)
>  create mode 100644 drivers/net/ethernet/socionext/Kconfig
>  create mode 100644 drivers/net/ethernet/socionext/Makefile
>  create mode 100644 drivers/net/ethernet/socionext/netsec.c
>
[...]
> diff --git a/drivers/net/ethernet/socionext/netsec.c 
> b/drivers/net/ethernet/socionext/netsec.c
> new file mode 100644
> index 000..4472303a
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -0,0 +1,1826 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#define NETSEC_REG_SOFT_RST0x104
> +#define NETSEC_REG_COM_INIT0x120
> +
> +#define NETSEC_REG_TOP_STATUS  0x200
> +#define NETSEC_IRQ_RX  BIT(1)
> +#define NETSEC_IRQ_TX  BIT(0)
> +
> +#define NETSEC_REG_TOP_INTEN   0x204
> +#define NETSEC_REG_INTEN_SET   0x234
> +#define NETSEC_REG_INTEN_CLR   0x238
> +
> +#define NETSEC_REG_NRM_TX_STATUS   0x400
> +#define NETSEC_REG_NRM_TX_INTEN0x404
> +#define NETSEC_REG_NRM_TX_INTEN_SET0x428
> +#define NETSEC_REG_NRM_TX_INTEN_CLR0x42c
> +#define NRM_TX_ST_NTOWNR   BIT(17)
> +#define NRM_TX_ST_TR_ERR   BIT(16)
> +#define NRM_TX_ST_TXDONE   BIT(15)
> +#define NRM_TX_ST_TMREXP   BIT(14)
> +
> +#define NETSEC_REG_NRM_RX_STATUS   0x440
> +#define NETSEC_REG_NRM_RX_INTEN0x444
> +#define NETSEC_REG_NRM_RX_INTEN_SET0x468
> +#define NETSEC_REG_NRM_RX_INTEN_CLR0x46c
> +#define NRM_RX_ST_RC_ERR   BIT(16)
> +#define NRM_RX_ST_PKTCNT   BIT(15)
> +#define NRM_RX_ST_TMREXP   BIT(14)
> +
> +#define NETSEC_REG_PKT_CMD_BUF 0xd0
> +
> +#define NETSEC_REG_CLK_EN  0x100
> +
> +#define NETSEC_REG_PKT_CTRL0x140
> +
> +#define NETSEC_REG_DMA_TMR_CTRL0x20c
> +#define NETSEC_REG_F_TAIKI_MC_VER  0x22c
> +#define NETSEC_REG_F_TAIKI_VER 0x230
> +#define NETSEC_REG_DMA_HM_CTRL 0x214
> +#define NETSEC_REG_DMA_MH_CTRL 0x220
> +#define NETSEC_REG_ADDR_DIS_CORE   0x218
> +#define NETSEC_REG_DMAC_HM_CMD_BUF 0x210
> +#define NETSEC_REG_DMAC_MH_CMD_BUF 0x21c
> +
> +#define NETSEC_REG_NRM_TX_PKTCNT   0x410
> +
> +#define NETSEC_REG_NRM_TX_DONE_PKTCNT  0x414
> +#define NETSEC_REG_NRM_TX_DONE_TXINT_PKTCNT0x418
> +
> +#define NETSEC_REG_NRM_TX_TMR  0x41c
> +
> +#define NETSEC_REG_NRM_RX_PKTCNT   0x454
> +#define NETSEC_REG_NRM_RX_RXINT_PKTCNT 0x458
> +#define NETSEC_REG_NRM_TX_TXINT_TMR0x420
> +#define NETSEC_REG_NRM_RX_RXINT_TMR0x460
> +
> +#define NETSEC_REG_NRM_RX_TMR  0x45c
> +
> +#define NETSEC_REG_NRM_TX_DESC_START_UP0x434
> +#define NETSEC_REG_NRM_TX_DESC_START_LW0x408
> +#define NETSEC_REG_NRM_RX_DESC_START_UP0x474
> +#define NETSEC_REG_NRM_RX_DESC_START_LW0x448
> +
> +#define NETSEC_REG_NRM_TX_CONFIG   0x430
> +#define NETSEC_REG_NRM_RX_CONFIG   0x470
> +
> +#define MAC_REG_STATUS 0x1024
> +#define MAC_REG_DATA   0x11c0
> +#define MAC_REG_CMD0x11c4
> +#define MAC_REG_FLOW_TH0x11cc
> +#define MAC_REG_INTF_SEL   0x11d4
> +#define MAC_REG_DESC_INIT  0x11fc
> +#define MAC_REG_DESC_SOFT_RST  0x1204
> +#define NETSEC_REG_MODE_TRANS_COMP_STATUS  0x500
> +
> +#define GMAC_REG_MCR   0x
> +#define GMAC_REG_MFFR  0x0004
> +#define GMAC_REG_GAR   0x0010
> +#define GMAC_REG_GDR   0x0014
> +#define GMAC_REG_FCR   0x0018
> +#define GMAC_REG_BMR   0x1000
> +#define GMAC_REG_RDLAR 0x100c
> +#define GMAC_REG_TDLAR 0x1010
>

Re: Huge memory leak with 4.15.0-rc2+

2017-12-12 Thread Paweł Staszewski




W dniu 2017-12-11 o 23:27, Paweł Staszewski pisze:



W dniu 2017-12-11 o 23:15, John Fastabend pisze:

On 12/11/2017 01:48 PM, Paweł Staszewski wrote:


W dniu 2017-12-11 o 22:23, Paweł Staszewski pisze:

Hi


I just upgraded some testing host to 4.15.0-rc2+ kernel

And after some time of traffic processing - when traffic on all ports
reach about 3Mpps - memleak started.



[...]


Some observations - when i disable tso on all cards there is more
memleak.






When traffic starts to drop - there is less and less memleak
below link to memory usage graph:
https://ibb.co/hU97kG

And there is rising slab_unrecl - Amount of unreclaimable memory used
for slab kernel allocations


Forgot to add that im using hfsc and qdiscs like pfifo on classes.



Maybe some error case I missed in the qdisc patches I'm looking into
it.

Thanks,
John



This is how it looks like when corelated on graph - traffic vs mem
https://ibb.co/njpkqG

Typical hfsc class + qdisc:
### Client interface vlan1616
tc qdisc del dev vlan1616 root
tc qdisc add dev vlan1616 handle 1: root hfsc default 100
tc class add dev vlan1616 parent 1: classid 1:100 hfsc ls m2 200Mbit 
ul m2 200Mbit

tc qdisc add dev vlan1616 parent 1:100 handle 100: pfifo limit 128
### End TM for client interface
tc qdisc del dev vlan1616 ingress
tc qdisc add dev vlan1616 handle : ingress
tc filter add dev vlan1616 parent : protocol ip prio 50 u32 match 
ip src 0.0.0.0/0 police rate 200Mbit burst 200M mtu 32k drop flowid 1:1


And this is same for about 450 vlan interfaces


Good thing is that compared to 4.14.3 i have about 5% less cpu load on 
4.15.0-rc2+


When hfsc will be lockless or tbf - then it will be really huge 
difference in cpu load on x86 when using traffic shaping - so really 
good job John.







Yestarday changed kernel from
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

to

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=v4.15-rc3


And there is no memleak.
So yes probabbly lockless qdisc patches

Re: [PATCH v6 2/3] sock: Move the socket inuse to namespace.

2017-12-12 Thread Cong Wang

On Sun, Dec 10, 2017 at 7:12 AM, Tonghao Zhang  wrote:
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index b797832..6c191fb 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -363,6 +363,13 @@ static struct net *net_alloc(void)
> if (!net)
> goto out_free;
>
> +#ifdef CONFIG_PROC_FS
> +   net->core.sock_inuse = alloc_percpu(int);
> +   if (!net->core.sock_inuse) {
> +   kmem_cache_free(net_cachep, net);
> +   goto out_free;
> +   }
> +#endif
> rcu_assign_pointer(net->gen, ng);
>  out:
> return net;
> @@ -374,6 +381,9 @@ static struct net *net_alloc(void)
>
>  static void net_free(struct net *net)
>  {
> +#ifdef CONFIG_PROC_FS
> +   free_percpu(net->core.sock_inuse);
> +#endif
> kfree(rcu_access_pointer(net->gen));
> kmem_cache_free(net_cachep, net);
>  }

Putting socket code in net_namespace.c doesn't look good.

[PATCHv2 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-12 Thread jassisinghbrar

From: Jassi Brar 

This driver adds support for Socionext "netsec" IP Gigabit
Ethernet + PHY IP used in the Synquacer SC2A11 SoC.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Jassi Brar 
---
 drivers/net/ethernet/Kconfig|1 +
 drivers/net/ethernet/Makefile   |1 +
 drivers/net/ethernet/socionext/Kconfig  |   29 +
 drivers/net/ethernet/socionext/Makefile |1 +
 drivers/net/ethernet/socionext/netsec.c | 1826 +++
 5 files changed, 1858 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/netsec.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..4601c2f
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,29 @@
+#
+# Socionext Network device configuration
+#
+
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext devices"
+   default y
+   ---help---
+ If you have a network (Ethernet) card belonging to this class, say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ the questions about Socionext cards. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_NETSEC
+   tristate "NETSEC Driver Support"
+   depends on (ARCH_SYNQUACER || COMPILE_TEST) && OF
+   select PHYLIB
+   select MII
+help
+ Enable to add support for the SocioNext NetSec Gigabit Ethernet
+ controller + PHY, as found on the Synquacer SC2A11 SoC
+
+ To compile this driver as a module, choose M here: the module will be
+ called netsec.  If unsure, say N.
+
+endif # NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..9505923
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_SNI_NETSEC) += netsec.o
diff --git a/drivers/net/ethernet/socionext/netsec.c 
b/drivers/net/ethernet/socionext/netsec.c
new file mode 100644
index 000..4472303a
--- /dev/null
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -0,0 +1,1826 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define NETSEC_REG_SOFT_RST0x104
+#define NETSEC_REG_COM_INIT0x120
+
+#define NETSEC_REG_TOP_STATUS  0x200
+#define NETSEC_IRQ_RX  BIT(1)
+#define NETSEC_IRQ_TX  BIT(0)
+
+#define NETSEC_REG_TOP_INTEN   0x204
+#define NETSEC_REG_INTEN_SET   0x234
+#define NETSEC_REG_INTEN_CLR   0x238
+
+#define NETSEC_REG_NRM_TX_STATUS   0x400
+#define NETSEC_REG_NRM_TX_INTEN0x404
+#define NETSEC_REG_NRM_TX_INTEN_SET0x428
+#define NETSEC_REG_NRM_TX_INTEN_CLR0x42c
+#define NRM_TX_ST_NTOWNR   BIT(17)
+#define NRM_TX_ST_TR_ERR   BIT(16)
+#define NRM_TX_ST_TXDONE   BIT(15)
+#define NRM_TX_ST_TMREXP   BIT(14)
+
+#define NETSEC_REG_NRM_RX_STATUS   0x440
+#define NETSEC_REG_NRM_RX_INTEN0x444
+#define NETSEC_REG_NRM_RX_INTEN_SET0x468
+#define NETSEC_REG_NRM_RX_INTEN_CLR0x46c
+#define NRM_RX_ST_RC_ERR   BIT(16)
+#define NRM_RX_ST_PKTCNT   BIT(15)
+#define NRM_RX_ST_TMREXP   BIT(14)
+
+#define NETSEC_REG_PKT_CMD_BUF 0xd0
+
+#define NETSEC_REG_CLK_EN  0x100
+
+#define NETSEC_REG_PKT_CTRL0x140
+
+#define NETSEC_REG_DMA_TMR_CTRL

[PATCHv2 1/3] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-12-12 Thread jassisinghbrar

From: Jassi Brar 

This patch adds documentation for Device-Tree bindings for the
Socionext NetSec Controller driver.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Jassi Brar 
---
 .../devicetree/bindings/net/socionext-netsec.txt   | 43 ++
 1 file changed, 43 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt

diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt 
b/Documentation/devicetree/bindings/net/socionext-netsec.txt
new file mode 100644
index 000..4695969
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
@@ -0,0 +1,45 @@
+* Socionext NetSec Ethernet Controller IP
+
+Required properties:
+- compatible: Should be "socionext,synquacer-netsec"
+- reg: Address and length of the control register area, followed by the
+   address and length of the EEPROM holding the MAC address and
+   microengine firmware
+- interrupts: Should contain ethernet controller interrupt
+- clocks: phandle to the PHY reference clock, and any other clocks to be
+  switched by runtime_pm
+- clock-names: Required only if more than a single clock is listed in 'clocks'.
+   The PHY reference clock must be named 'phy_refclk'
+- phy-mode: See ethernet.txt file in the same directory
+- phy-handle: phandle to select child phy
+
+Optional properties: (See ethernet.txt file in the same directory)
+- dma-coherent: Boolean property, must only be present if memory
+accesses performed by the device are cache coherent
+- local-mac-address
+- mac-address
+- max-speed
+- max-frame-size
+
+Required properties for the child phy:
+- reg: phy address
+
+Example:
+   eth0: netsec@522D {
+   compatible = "socionext,synquacer-netsec";
+   reg = <0 0x522D 0x0 0x1>, <0 0x1000 0x0 0x1>;
+   interrupts = ;
+   clocks = <_netsec>;
+   phy-mode = "rgmii";
+   max-speed = <1000>;
+   max-frame-size = <9000>;
+   phy-handle = <>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethphy0: ethernet-phy@1 {
+   compatible = "ethernet-phy-ieee802.3-c22";
+   reg = <1>;
+   };
+   };
-- 
2.7.4

Re: [PATCH] of_mdio / mdiobus: ensure mdio devices have fwnode correctly populated

2017-12-12 Thread Rob Herring

On Tue, Dec 12, 2017 at 4:49 AM, Russell King
 wrote:
> Ensure that all mdio devices populate the struct device fwnode pointer
> as well as the of_node pointer to allow drivers that wish to use
> fwnode APIs to work.
>
> Signed-off-by: Russell King 
> ---
>  drivers/net/phy/mdio_bus.c | 1 +
>  drivers/of/of_mdio.c   | 3 +++
>  2 files changed, 4 insertions(+)

Reviewed-by: Rob Herring

[PATCH V3 net-next 1/8] net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface

2017-12-12 Thread Salil Mehta

This patch adds support of command interface for communication with
the IMP(Integrated Management Processor) for HNS3 Virtual Function
Driver.

Each VF has support of CQP(Command Queue Pair) ring interface.
Each CQP consis of send queue CSQ and receive queue CRQ.
There are various commands a VF may support, like to query frimware
version, TQP management, statistics, interrupt related, mailbox etc.

This also contains code to initialize the command queue, manage the
command queue descriptors and Rx/Tx protocol with the command processor
in the form of various commands/results and acknowledgements.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
Patch V3: Addressed comment from Philippe Ombredanne
  Link: https://lkml.org/lkml/2017/12/8/874
Patch V2: Reworked comments by David Miller(except one comment on the
  udelay() while holding locks. Needs further discussion)
  Link: https://lkml.org/lkml/2017/12/5/639
Patch V1: Initial Submit
---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c   | 344 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h   | 256 +++
 2 files changed, 600 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
new file mode 100644
index 000..04b03d8
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hclgevf_cmd.h"
+#include "hclgevf_main.h"
+#include "hnae3.h"
+
+#define hclgevf_is_csq(ring) ((ring)->flag & HCLGEVF_TYPE_CSQ)
+#define hclgevf_ring_to_dma_dir(ring) (hclgevf_is_csq(ring) ? \
+   DMA_TO_DEVICE : DMA_FROM_DEVICE)
+#define cmq_ring_to_dev(ring)   (&(ring)->dev->pdev->dev)
+
+static int hclgevf_ring_space(struct hclgevf_cmq_ring *ring)
+{
+   int ntc = ring->next_to_clean;
+   int ntu = ring->next_to_use;
+   int used;
+
+   used = (ntu - ntc + ring->desc_num) % ring->desc_num;
+
+   return ring->desc_num - used - 1;
+}
+
+static int hclgevf_cmd_csq_clean(struct hclgevf_hw *hw)
+{
+   struct hclgevf_cmq_ring *csq = >cmq.csq;
+   u16 ntc = csq->next_to_clean;
+   struct hclgevf_desc *desc;
+   int clean = 0;
+   u32 head;
+
+   desc = >desc[ntc];
+   head = hclgevf_read_dev(hw, HCLGEVF_NIC_CSQ_HEAD_REG);
+   while (head != ntc) {
+   memset(desc, 0, sizeof(*desc));
+   ntc++;
+   if (ntc == csq->desc_num)
+   ntc = 0;
+   desc = >desc[ntc];
+   clean++;
+   }
+   csq->next_to_clean = ntc;
+
+   return clean;
+}
+
+static bool hclgevf_cmd_csq_done(struct hclgevf_hw *hw)
+{
+   u32 head;
+
+   head = hclgevf_read_dev(hw, HCLGEVF_NIC_CSQ_HEAD_REG);
+
+   return head == hw->cmq.csq.next_to_use;
+}
+
+static bool hclgevf_is_special_opcode(u16 opcode)
+{
+   u16 spec_opcode[] = {0x30, 0x31, 0x32};
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(spec_opcode); i++) {
+   if (spec_opcode[i] == opcode)
+   return true;
+   }
+
+   return false;
+}
+
+static int hclgevf_alloc_cmd_desc(struct hclgevf_cmq_ring *ring)
+{
+   int size = ring->desc_num * sizeof(struct hclgevf_desc);
+
+   ring->desc = kzalloc(size, GFP_KERNEL);
+   if (!ring->desc)
+   return -ENOMEM;
+
+   ring->desc_dma_addr = dma_map_single(cmq_ring_to_dev(ring), ring->desc,
+size, DMA_BIDIRECTIONAL);
+
+   if (dma_mapping_error(cmq_ring_to_dev(ring), ring->desc_dma_addr)) {
+   ring->desc_dma_addr = 0;
+   kfree(ring->desc);
+   ring->desc = NULL;
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static void hclgevf_free_cmd_desc(struct hclgevf_cmq_ring *ring)
+{
+   dma_unmap_single(cmq_ring_to_dev(ring), ring->desc_dma_addr,
+ring->desc_num * sizeof(ring->desc[0]),
+hclgevf_ring_to_dma_dir(ring));
+
+   ring->desc_dma_addr = 0;
+   kfree(ring->desc);
+   ring->desc = NULL;
+}
+
+static int hclgevf_init_cmd_queue(struct hclgevf_dev *hdev,
+ struct hclgevf_cmq_ring *ring)
+{
+   struct hclgevf_hw *hw = >hw;
+   int ring_type = ring->flag;
+   u32 reg_val;
+   int ret;
+
+   ring->desc_num = HCLGEVF_NIC_CMQ_DESC_NUM;
+   spin_lock_init(>lock);
+   ring->next_to_clean = 0;
+   ring->next_to_use = 0;
+   ring->dev = hdev;
+
+   /* allocate CSQ/CRQ descriptor */
+   ret =

[PATCH V3 net-next 7/8] net: hns3: Change PF to add ring-vect binding & resetQ to mailbox

2017-12-12 Thread Salil Mehta

This patch is required to support ring-vector binding and reset
of TQPs requested by the VF driver to the PF driver. Mailbox
handler is added with corresponding VF commands/messages to
handle the request.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 138 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   7 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 106 
 3 files changed, 159 insertions(+), 92 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 980fcdf..3b1fc49 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3256,49 +3256,48 @@ int hclge_rss_init_hw(struct hclge_dev *hdev)
return ret;
 }
 
-int hclge_map_vport_ring_to_vector(struct hclge_vport *vport, int vector_id,
-  struct hnae3_ring_chain_node *ring_chain)
+int hclge_bind_ring_with_vector(struct hclge_vport *vport,
+   int vector_id, bool en,
+   struct hnae3_ring_chain_node *ring_chain)
 {
struct hclge_dev *hdev = vport->back;
-   struct hclge_ctrl_vector_chain_cmd *req;
struct hnae3_ring_chain_node *node;
struct hclge_desc desc;
-   int ret;
+   struct hclge_ctrl_vector_chain_cmd *req
+   = (struct hclge_ctrl_vector_chain_cmd *)desc.data;
+   enum hclge_cmd_status status;
+   enum hclge_opcode_type op;
+   u16 tqp_type_and_id;
int i;
 
-   hclge_cmd_setup_basic_desc(, HCLGE_OPC_ADD_RING_TO_VECTOR, false);
-
-   req = (struct hclge_ctrl_vector_chain_cmd *)desc.data;
+   op = en ? HCLGE_OPC_ADD_RING_TO_VECTOR : HCLGE_OPC_DEL_RING_TO_VECTOR;
+   hclge_cmd_setup_basic_desc(, op, false);
req->int_vector_id = vector_id;
 
i = 0;
for (node = ring_chain; node; node = node->next) {
-   u16 type_and_id = 0;
-
-   hnae_set_field(type_and_id, HCLGE_INT_TYPE_M, HCLGE_INT_TYPE_S,
+   tqp_type_and_id = le16_to_cpu(req->tqp_type_and_id[i]);
+   hnae_set_field(tqp_type_and_id,  HCLGE_INT_TYPE_M,
+  HCLGE_INT_TYPE_S,
   hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
-   hnae_set_field(type_and_id, HCLGE_TQP_ID_M, HCLGE_TQP_ID_S,
-  node->tqp_index);
-   hnae_set_field(type_and_id, HCLGE_INT_GL_IDX_M,
-  HCLGE_INT_GL_IDX_S,
-  hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
-   req->tqp_type_and_id[i] = cpu_to_le16(type_and_id);
-   req->vfid = vport->vport_id;
-
+   hnae_set_field(tqp_type_and_id, HCLGE_TQP_ID_M,
+  HCLGE_TQP_ID_S, node->tqp_index);
+   req->tqp_type_and_id[i] = cpu_to_le16(tqp_type_and_id);
if (++i >= HCLGE_VECTOR_ELEMENTS_PER_CMD) {
req->int_cause_num = HCLGE_VECTOR_ELEMENTS_PER_CMD;
+   req->vfid = vport->vport_id;
 
-   ret = hclge_cmd_send(>hw, , 1);
-   if (ret) {
+   status = hclge_cmd_send(>hw, , 1);
+   if (status) {
dev_err(>pdev->dev,
"Map TQP fail, status is %d.\n",
-   ret);
-   return ret;
+   status);
+   return -EIO;
}
i = 0;
 
hclge_cmd_setup_basic_desc(,
-  HCLGE_OPC_ADD_RING_TO_VECTOR,
+  op,
   false);
req->int_vector_id = vector_id;
}
@@ -3306,21 +3305,21 @@ int hclge_map_vport_ring_to_vector(struct hclge_vport 
*vport, int vector_id,
 
if (i > 0) {
req->int_cause_num = i;
-
-   ret = hclge_cmd_send(>hw, , 1);
-   if (ret) {
+   req->vfid = vport->vport_id;
+   status = hclge_cmd_send(>hw, , 1);
+   if (status) {
dev_err(>pdev->dev,
-   "Map TQP fail, status is %d.\n", ret);
-   return ret;
+   "Map TQP fail, status is %d.\n", status);
+   return -EIO;
}
}
 
return 0;
 }
 
-static int hclge_map_handle_ring_to_vector(
-   struct hnae3_handle *handle, int vector,
-

[PATCH V3 net-next 8/8] net: hns3: Add mailbox interrupt handling to PF driver

2017-12-12 Thread Salil Mehta

All PF mailbox events are conveyed through a common interrupt
(vector 0). This interrupt vector is shared by reset and mailbox.

This patch adds the handling of mailbox interrupt event and its
deferred processing in context to a separate mailbox task.

Signed-off-by: Salil Mehta 
Signed-off-by: lipeng 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 68 +++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  8 ++-
 2 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3b1fc49..e97fd66 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2227,6 +2227,12 @@ static int hclge_mac_init(struct hclge_dev *hdev)
return hclge_cfg_func_mta_filter(hdev, 0, hdev->accept_mta_mc);
 }
 
+static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
+{
+   if (!test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, >state))
+   schedule_work(>mbx_service_task);
+}
+
 static void hclge_reset_task_schedule(struct hclge_dev *hdev)
 {
if (!test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, >state))
@@ -2372,9 +2378,18 @@ static void hclge_service_complete(struct hclge_dev 
*hdev)
 static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 {
u32 rst_src_reg;
+   u32 cmdq_src_reg;
 
/* fetch the events from their corresponding regs */
rst_src_reg = hclge_read_dev(>hw, HCLGE_MISC_RESET_STS_REG);
+   cmdq_src_reg = hclge_read_dev(>hw, HCLGE_VECTOR0_CMDQ_SRC_REG);
+
+   /* Assumption: If by any chance reset and mailbox events are reported
+* together then we will only process reset event in this go and will
+* defer the processing of the mailbox events. Since, we would have not
+* cleared RX CMDQ event this time we would receive again another
+* interrupt from H/W just for the mailbox.
+*/
 
/* check for vector0 reset event sources */
if (BIT(HCLGE_VECTOR0_GLOBALRESET_INT_B) & rst_src_reg) {
@@ -2395,7 +2410,12 @@ static u32 hclge_check_event_cause(struct hclge_dev 
*hdev, u32 *clearval)
return HCLGE_VECTOR0_EVENT_RST;
}
 
-   /* mailbox event sharing vector 0 interrupt would be placed here */
+   /* check for vector0 mailbox(=CMDQ RX) event source */
+   if (BIT(HCLGE_VECTOR0_RX_CMDQ_INT_B) & cmdq_src_reg) {
+   cmdq_src_reg &= ~BIT(HCLGE_VECTOR0_RX_CMDQ_INT_B);
+   *clearval = cmdq_src_reg;
+   return HCLGE_VECTOR0_EVENT_MBX;
+   }
 
return HCLGE_VECTOR0_EVENT_OTHER;
 }
@@ -2403,10 +2423,14 @@ static u32 hclge_check_event_cause(struct hclge_dev 
*hdev, u32 *clearval)
 static void hclge_clear_event_cause(struct hclge_dev *hdev, u32 event_type,
u32 regclr)
 {
-   if (event_type == HCLGE_VECTOR0_EVENT_RST)
+   switch (event_type) {
+   case HCLGE_VECTOR0_EVENT_RST:
hclge_write_dev(>hw, HCLGE_MISC_RESET_STS_REG, regclr);
-
-   /* mailbox event sharing vector 0 interrupt would be placed here */
+   break;
+   case HCLGE_VECTOR0_EVENT_MBX:
+   hclge_write_dev(>hw, HCLGE_VECTOR0_CMDQ_SRC_REG, regclr);
+   break;
+   }
 }
 
 static void hclge_enable_vector(struct hclge_misc_vector *vector, bool enable)
@@ -2423,13 +2447,23 @@ static irqreturn_t hclge_misc_irq_handle(int irq, void 
*data)
hclge_enable_vector(>misc_vector, false);
event_cause = hclge_check_event_cause(hdev, );
 
-   /* vector 0 interrupt is shared with reset and mailbox source events.
-* For now, we are not handling mailbox events.
-*/
+   /* vector 0 interrupt is shared with reset and mailbox source events.*/
switch (event_cause) {
case HCLGE_VECTOR0_EVENT_RST:
hclge_reset_task_schedule(hdev);
break;
+   case HCLGE_VECTOR0_EVENT_MBX:
+   /* If we are here then,
+* 1. Either we are not handling any mbx task and we are not
+*scheduled as well
+*OR
+* 2. We could be handling a mbx task but nothing more is
+*scheduled.
+* In both cases, we should schedule mbx task as there are more
+* mbx messages reported by this interrupt.
+*/
+   hclge_mbx_task_schedule(hdev);
+
default:
dev_dbg(>pdev->dev,
"received unknown or unhandled event of vector0\n");
@@ -2708,6 +2742,21 @@ static void hclge_reset_service_task(struct work_struct 
*work)
clear_bit(HCLGE_STATE_RST_HANDLING, >state);
 }
 
+static void hclge_mailbox_service_task(struct work_struct *work)

[PATCH V3 net-next 0/8] Hisilicon Network Subsystem 3 VF Ethernet Driver

2017-12-12 Thread Salil Mehta

This patch-set contains the support of the HNS3 (Hisilicon Network Subsystem 3)
Virtual Function Ethernet driver for hip08 family of SoCs. The Physical Function
driver is already part of the Linux mainline. 

This VF driver has its Hardware Compatibility Layer and has commom/unified ENET
layer/client/ethtool code with the PF driver. It also has support of mailbox to
communicate with the HNS3 PF driver. The basic architecture of VF driver is
derivative of the PF driver. Just like PF driver, this driver is also PCI
Express based.

This driver is the ongoing development work and HNS3 VF Ethernet driver would be
incrementally enhanced with more new features.

High Level Architecture:

 [ Ethtool ]
 | 
 [ Ethernet Client ] ... [ RoCE Client ] 
 | |   
   [ HNAE Device ] |   
 | |   |
-  |
   |
 [ HNAE3 Framework (Register/unregister) ] |
   |
-  |
 | |
 [ VF HCLGE Layer ]|
  | |  |
  | |  |
  | |  |
  | [ VF Mailbox (To PF via IMP) ] | 
  | |  |   
 [ IMP command Interface ]  [ IMP command Interface ]
|  |
|  |
   (A B O V E  R U N S  O N  G U E S T  S Y S T E M)
-
  Q E M U / V F I O / K V M (on Host System)
-
HIP08  H A R D W A R E (limited to VF by SMMU)

   [ IMP/Mgmt Processor (hardware common to system/cmd based) ]


Fig 1.   HNS3 Virtual Function Driver




[ dcbnl ]  [ Ethtool ]
|  |
[  Ethernet Client  ]  [ ODP/UIO Client ] . . .[ RoCE Client ] 
  |_| |
 |   _|
   [ HNAE Device ]   ||
 |   ||
- |
  |
 [ HNAE3 Framework (Register/unregister) ]|
  |
- |
 ||
  [ HCLGE Layer ] |
 |_   |
|| |  |
 [ DCB ] | |  |
|| |  |
  [ Scheduler/Shaper ] [ MDIO ]  [ PF Mailbox ]   |
|| |  |
||_|  | 
 ||
 [ IMP command Interface ] [ IMP command Interface ] 

  HIP08  H A R D W A R E  

  [ IMP/Mgmt Processor (hardware common to system/cmd based) ]


   Fig 2.Existing HNS3 PF Driver (added with mailbox)


Change Log Summary:
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
Patch V2: 1. Addressed some comments by David Miller.
  2. Addressed some internal comments on various patches
Patch V1: Initial Submit


Salil Mehta (8):
  net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface
  net: hns3: Add mailbox support to VF driver
  net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support
  net: hns3: Add HNS3 VF driver to kernel build framework
  net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC
  net: hns3: Add mailbox support to PF driver
  net: hns3: Change PF to add ring-vect binding & resetQ to mailbox
  net: hns3: Add mailbox interrupt handling to PF driver

 drivers/net/ethernet/hisilicon/Kconfig |   28 +-
 drivers/net/ethernet/hisilicon/hns3/Makefile   |7 +
 drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h|   88 ++
 drivers/net/ethernet/hisilicon/hns3/hnae3.c|   14 +-
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|7 +-
 .../hisilicon/hns3/{hns3pf => }/hns3_dcbnl.c   |2 +-
 .../hisilicon/hns3/{hns3pf =>

Re: [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> + memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);

Maybe use ether_addr_copy() here?

[PATCH net] skge: remove redundunt free_irq under spinlock

2017-12-12 Thread Stephen Hemminger

The code to handle multi-port SKGE boards was freeing IRQ
twice. The first one was under lock and might sleep.

Signed-off-by: Stephen Hemminger 
---
Given that multi-port SKGE devices are very old and unlikely
to still be in use. This patch does not need to go to stable.

 drivers/net/ethernet/marvell/skge.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/skge.c 
b/drivers/net/ethernet/marvell/skge.c
index 6e423f098a60..31efc47c847e 100644
--- a/drivers/net/ethernet/marvell/skge.c
+++ b/drivers/net/ethernet/marvell/skge.c
@@ -4081,7 +4081,6 @@ static void skge_remove(struct pci_dev *pdev)
if (hw->ports > 1) {
skge_write32(hw, B0_IMSK, 0);
skge_read32(hw, B0_IMSK);
-   free_irq(pdev->irq, hw);
}
spin_unlock_irq(>hw_lock);
 
-- 
2.11.0

[BUG] b44: two possible sleep-in-atomic bugs in b44_set_link_ksettings and b44_ioctl

2017-12-12 Thread Jia-Ju Bai


The driver may sleep under a spinlock.
The function call paths are:
b44_set_link_ksettings (acquire the spinlock)
  phy_ethtool_ksettings_set
phy_start_aneg
  phy_start_aneg_priv
mutex_lock --> may sleep

b44_ioctl (acquire the spinlock)
  phy_mii_ioctl
mdiobus_read
  mutex_lock --> may sleep

I do not find a good way to fix them, so I only report.
These possible bugs are found by my static analysis tool (DSAC) and 
checked by my code review.



Thanks,
Jia-Ju Bai

[BUG] renesas/ravb: two possible sleep-in-atomic bugs in ravb_set_link_ksettings and ravb_nway_reset

2017-12-12 Thread Jia-Ju Bai

Accoring to drivers/net/ethernet/renesas/ravb_main.c, the driver may 
sleep under a spinlock.

The function call paths are:
ravb_set_link_ksettings (acquire the spinlock)
  phy_ethtool_ksettings_set
phy_start_aneg
  phy_start_aneg_priv
mutex_lock --> may sleep

ravb_nway_reset (acquire the spinlock)
  phy_start_aneg
phy_start_aneg_priv
  mutex_lock --> may sleep

I do not find a good way to fix them, so I only report.
These possible bugs are found by my static analysis tool (DSAC) and 
checked by my code review.



Thanks,
Jia-Ju Bai

[PATCH v2 net-next 1/3] net: dsa: mediatek: add VLAN support for MT7530

2017-12-12 Thread sean.wang

From: Sean Wang 

MT7530 can treat each port as either VLAN-unaware port or VLAN-aware port
through the implementation of port matrix mode or port security mode on
the ingress port, respectively. On one hand, Each port has been acting as
the VLAN-unaware one whenever the device is created in the initial or
certain port joins or leaves into/from the bridge at the runtime. On the
other hand, the patch just filling the required callbacks for VLAN
operations is achieved via extending the port to be into port security
mode when the port is configured as VLAN-aware port. Which mode can make
the port be able to recognize VID from incoming packets and look up VLAN
table to validate and judge which port it should be going to. And the
range for VID from 1 to 4094 is valid for the hardware.

Signed-off-by: Sean Wang 
---
 drivers/net/dsa/mt7530.c | 291 ++-
 drivers/net/dsa/mt7530.h |  83 +-
 2 files changed, 367 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 2820d69..252e8ba 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -805,6 +805,69 @@ mt7530_port_bridge_join(struct dsa_switch *ds, int port,
 }
 
 static void
+mt7530_port_set_vlan_unaware(struct dsa_switch *ds, int port)
+{
+   struct mt7530_priv *priv = ds->priv;
+   bool all_user_ports_removed = true;
+   int i;
+
+   /* When a port is removed from the bridge, the port would be set up
+* back to the default as is at initial boot which is a VLAN-unaware
+* port.
+*/
+   mt7530_rmw(priv, MT7530_PCR_P(port), PCR_PORT_VLAN_MASK,
+  MT7530_PORT_MATRIX_MODE);
+   mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK,
+  VLAN_ATTR(MT7530_VLAN_TRANSPARENT));
+
+   priv->ports[port].vlan_filtering = false;
+
+   for (i = 0; i < MT7530_NUM_PORTS; i++) {
+   if (dsa_is_user_port(ds, i) &&
+   priv->ports[i].vlan_filtering) {
+   all_user_ports_removed = false;
+   break;
+   }
+   }
+
+   /* CPU port also does the same thing until all user ports belonging to
+* the CPU port get out of VLAN filtering mode.
+*/
+   if (all_user_ports_removed) {
+   mt7530_write(priv, MT7530_PCR_P(MT7530_CPU_PORT),
+PCR_MATRIX(dsa_user_ports(priv->ds)));
+   mt7530_write(priv, MT7530_PVC_P(MT7530_CPU_PORT),
+PORT_SPEC_TAG);
+   }
+}
+
+static void
+mt7530_port_set_vlan_aware(struct dsa_switch *ds, int port)
+{
+   struct mt7530_priv *priv = ds->priv;
+
+   /* The real fabric path would be decided on the membership in the
+* entry of VLAN table. PCR_MATRIX set up here with ALL_MEMBERS
+* means potential VLAN can be consisting of certain subset of all
+* ports.
+*/
+   mt7530_rmw(priv, MT7530_PCR_P(port),
+  PCR_MATRIX_MASK, PCR_MATRIX(MT7530_ALL_MEMBERS));
+
+   /* Trapped into security mode allows packet forwarding through VLAN
+* table lookup.
+*/
+   mt7530_rmw(priv, MT7530_PCR_P(port), PCR_PORT_VLAN_MASK,
+  MT7530_PORT_SECURITY_MODE);
+
+   /* Set the port as a user port which is to be able to recognize VID
+* from incoming packets before fetching entry within the VLAN table.
+*/
+   mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK,
+  VLAN_ATTR(MT7530_VLAN_USER));
+}
+
+static void
 mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
 struct net_device *bridge)
 {
@@ -817,8 +880,11 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
/* Remove this port from the port matrix of the other ports
 * in the same bridge. If the port is disabled, port matrix
 * is kept and not being setup until the port becomes enabled.
+* And the other port's port matrix cannot be broken when the
+* other port is still a VLAN-aware port.
 */
-   if (dsa_is_user_port(ds, i) && i != port) {
+   if (!priv->ports[i].vlan_filtering &&
+   dsa_is_user_port(ds, i) && i != port) {
if (dsa_to_port(ds, i)->bridge_dev != bridge)
continue;
if (priv->ports[i].enable)
@@ -836,6 +902,8 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
   PCR_MATRIX(BIT(MT7530_CPU_PORT)));
priv->ports[port].pm = PCR_MATRIX(BIT(MT7530_CPU_PORT));
 
+   mt7530_port_set_vlan_unaware(ds, port);
+
mutex_unlock(>reg_mutex);
 }
 
@@ -906,6 +974,223 @@ mt7530_port_fdb_dump(struct dsa_switch *ds, int port,
return 0;
 }
 
+static int

[PATCH v2 net-next 3/3] net: dsa: mediatek: update MAINTAINERS entry with MediaTek switch driver

2017-12-12 Thread sean.wang

From: Sean Wang 

I work for MediaTek and maintain SoC targeting to home gateway and
also will keep extending and testing the function from MediaTek
switch.

Signed-off-by: Sean Wang 
Reviewed-by: Andrew Lunn 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c0edf30..070fd91 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8723,6 +8723,13 @@ L:   netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/ethernet/mediatek/
 
+MEDIATEK SWITCH DRIVER
+M: Sean Wang 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/dsa/mt7530.*
+F: net/dsa/tag_mtk.c
+
 MEDIATEK JPEG DRIVER
 M: Rick Chang 
 M: Bin Liu 
-- 
2.7.4

[PATCH v2 net-next 2/3] net: dsa: mediatek: combine MediaTek tag with VLAN tag

2017-12-12 Thread sean.wang

From: Sean Wang 

In order to let MT7530 switch can recognize well those egress packets
having both special tag and VLAN tag, the information about the special
tag should be carried on the existing VLAN tag. On the other hand, it's
unnecessary for extra handling for ingress packets when VLAN tag is
present since it is able to put the VLAN tag after the special tag and
then follow the existing way to parse.

Signed-off-by: Sean Wang 
---
 net/dsa/tag_mtk.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index 8475434..11535bc 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -13,10 +13,13 @@
  */
 
 #include 
+#include 
 
 #include "dsa_priv.h"
 
 #define MTK_HDR_LEN4
+#define MTK_HDR_XMIT_UNTAGGED  0
+#define MTK_HDR_XMIT_TAGGED_TPID_8100  1
 #define MTK_HDR_RECV_SOURCE_PORT_MASK  GENMASK(2, 0)
 #define MTK_HDR_XMIT_DP_BIT_MASK   GENMASK(5, 0)
 
@@ -25,20 +28,37 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 {
struct dsa_port *dp = dsa_slave_to_port(dev);
u8 *mtk_tag;
+   bool is_vlan_skb = true;
 
-   if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
-   return NULL;
-
-   skb_push(skb, MTK_HDR_LEN);
+   /* Build the special tag after the MAC Source Address. If VLAN header
+* is present, it's required that VLAN header and special tag is
+* being combined. Only in this way we can allow the switch can parse
+* the both special and VLAN tag at the same time and then look up VLAN
+* table with VID.
+*/
+   if (!skb_vlan_tagged(skb)) {
+   if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
+   return NULL;
 
-   memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
+   skb_push(skb, MTK_HDR_LEN);
+   memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
+   is_vlan_skb = false;
+   }
 
-   /* Build the tag after the MAC Source Address */
mtk_tag = skb->data + 2 * ETH_ALEN;
-   mtk_tag[0] = 0;
+
+   /* Mark tag attribute on special tag insertion to notify hardware
+* whether that's a combined special tag with 802.1Q header.
+*/
+   mtk_tag[0] = is_vlan_skb ? MTK_HDR_XMIT_TAGGED_TPID_8100 :
+MTK_HDR_XMIT_UNTAGGED;
mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK;
-   mtk_tag[2] = 0;
-   mtk_tag[3] = 0;
+
+   /* Tag control information is kept for 802.1Q */
+   if (!is_vlan_skb) {
+   mtk_tag[2] = 0;
+   mtk_tag[3] = 0;
+   }
 
return skb;
 }
-- 
2.7.4

[BUG] renesas/sh_eth: two possible sleep-in-atomic bugs in sh_eth_set_link_ksettings and sh_eth_nway_reset

2017-12-12 Thread Jia-Ju Bai

Accoring to drivers/net/ethernet/renesas/sh_eth.c, the driver may sleep 
under a spinlock.

The function call paths are:
sh_eth_set_link_ksettings (acquire the spinlock)
  phy_ethtool_ksettings_set
phy_start_aneg
  phy_start_aneg_priv
mutex_lock --> may sleep

sh_eth_nway_reset (acquire the spinlock)
  phy_start_aneg
phy_start_aneg_priv
  mutex_lock --> may sleep

I do not find a good way to fix them, so I only report.
These possible bugs are found by my static analysis tool (DSAC) and 
checked by my code review.



Thanks,
Jia-Ju Bai

[PATCH v2 net-next 0/3] add VLAN support to DSA MT7530

2017-12-12 Thread sean.wang

From: Sean Wang 

Changes since v1:
- fix up the typo
- prefer ordering declarations longest to shortest
- update that vlan_prepare callback should not change any state
- use lower case letter for function naming

The patchset extends DSA MT7530 to VLAN support through filling required
callbacks in patch 1 and merging the special tag with VLAN tag in patch 2
for allowing that the hardware can handle these packets with VID from the
CPU port.

Sean Wang (3):
  net: dsa: mediatek: add VLAN support for MT7530
  net: dsa: mediatek: combine MediaTek tag with VLAN tag
  net: dsa: mediatek: update MAINTAINERS entry with MediaTek switch
driver

 MAINTAINERS  |   7 ++
 drivers/net/dsa/mt7530.c | 291 ++-
 drivers/net/dsa/mt7530.h |  83 +-
 net/dsa/tag_mtk.c|  38 +--
 4 files changed, 403 insertions(+), 16 deletions(-)

-- 
2.7.4

[PATCH v3] igb: Free IRQs when device is hotplugged

2017-12-12 Thread Lyude Paul

Recently I got a Caldigit TS3 Thunderbolt 3 dock, and noticed that upon
hotplugging my kernel would immediately crash due to igb:

[  680.825801] kernel BUG at drivers/pci/msi.c:352!
[  680.828388] invalid opcode:  [#1] SMP
[  680.829194] Modules linked in: igb(O) thunderbolt i2c_algo_bit joydev vfat 
fat btusb btrtl btbcm btintel bluetooth ecdh_generic hp_wmi sparse_keymap 
rfkill wmi_bmof iTCO_wdt intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul 
snd_pcm rtsx_pci_ms mei_me snd_timer memstick snd pcspkr mei soundcore i2c_i801 
tpm_tis psmouse shpchp wmi tpm_tis_core tpm video hp_wireless acpi_pad 
rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci mfd_core xhci_pci 
xhci_hcd i2c_hid i2c_core [last unloaded: igb]
[  680.831085] CPU: 1 PID: 78 Comm: kworker/u16:1 Tainted: G   O 
4.15.0-rc3Lyude-Test+ #6
[  680.831596] Hardware name: HP HP ZBook Studio G4/826B, BIOS P71 Ver. 01.03 
06/09/2017
[  680.832168] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  680.832687] RIP: 0010:free_msi_irqs+0x180/0x1b0
[  680.833271] RSP: 0018:c930fbf0 EFLAGS: 00010286
[  680.833761] RAX: 8803405f9c00 RBX: 88033e3d2e40 RCX: 002c
[  680.834278] RDX:  RSI: 00ac RDI: 880340be2178
[  680.834832] RBP:  R08: 880340be1ff0 R09: 8803405f9c00
[  680.835342] R10:  R11: 0040 R12: 88033d63a298
[  680.835822] R13: 88033d63a000 R14: 0060 R15: 880341959000
[  680.836332] FS:  () GS:88034f44() 
knlGS:
[  680.836817] CS:  0010 DS:  ES:  CR0: 80050033
[  680.837360] CR2: 55e64044afdf CR3: 01c09002 CR4: 003606e0
[  680.837954] Call Trace:
[  680.838853]  pci_disable_msix+0xce/0xf0
[  680.839616]  igb_reset_interrupt_capability+0x5d/0x60 [igb]
[  680.840278]  igb_remove+0x9d/0x110 [igb]
[  680.840764]  pci_device_remove+0x36/0xb0
[  680.841279]  device_release_driver_internal+0x157/0x220
[  680.841739]  pci_stop_bus_device+0x7d/0xa0
[  680.842255]  pci_stop_bus_device+0x2b/0xa0
[  680.842722]  pci_stop_bus_device+0x3d/0xa0
[  680.843189]  pci_stop_and_remove_bus_device+0xe/0x20
[  680.843627]  trim_stale_devices+0xf3/0x140
[  680.844086]  trim_stale_devices+0x94/0x140
[  680.844532]  trim_stale_devices+0xa6/0x140
[  680.845031]  ? get_slot_status+0x90/0xc0
[  680.845536]  acpiphp_check_bridge.part.5+0xfe/0x140
[  680.846021]  acpiphp_hotplug_notify+0x175/0x200
[  680.846581]  ? free_bridge+0x100/0x100
[  680.847113]  acpi_device_hotplug+0x8a/0x490
[  680.847535]  acpi_hotplug_work_fn+0x1a/0x30
[  680.848076]  process_one_work+0x182/0x3a0
[  680.848543]  worker_thread+0x2e/0x380
[  680.848963]  ? process_one_work+0x3a0/0x3a0
[  680.849373]  kthread+0x111/0x130
[  680.849776]  ? kthread_create_worker_on_cpu+0x50/0x50
[  680.850188]  ret_from_fork+0x1f/0x30
[  680.850601] Code: 43 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 
14 0f 86 c5 fe ff ff 8b 7b 10 01 ef e8 b7 e4 d2 ff 48 83 78 70 00 74 e3 <0f> 0b 
49 8d b5 a0 00 00 00 e8 62 6f d3 ff e9 c7 fe ff ff 48 8b
[  680.851497] RIP: free_msi_irqs+0x180/0x1b0 RSP: c930fbf0

As it turns out, normally the freeing of IRQs that would fix this is called
inside of the scope of __igb_close(). However, since the device is
already gone by the point we try to unregister the netdevice from the
driver due to a hotplug we end up seeing that the netif isn't present
and thus, forget to free any of the device IRQs.

So: make sure that if we're in the process of dismantling the netdev, we
always allow __igb_close() to be called so that IRQs may be freed
normally. Additionally, only allow igb_close() to be called from
__igb_close() if it hasn't already been called for the given adapter.

Signed-off-by: Lyude Paul 
Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
Cc: Todd Fujinaka 
Cc: Stephen Hemminger 
Cc: sta...@vger.kernel.org
---
Changes since v2:
  - Remove hunk in __igb_close() that was left over by accident, it's
not needed

 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index c208753ff5b7..c69a5b3ae8c8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3676,7 +3676,7 @@ static int __igb_close(struct net_device *netdev, bool 
suspending)
 
 int igb_close(struct net_device *netdev)
 {
-   if (netif_device_present(netdev))
+   if (netif_device_present(netdev) || netdev->dismantle)
return __igb_close(netdev, false);
return 0;
 }
-- 
2.14.3

Re: [PATCHv2 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-12 Thread Andrew Lunn

> > +static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
> > +{
> > +   struct mii_bus *bus;
> > +   int ret;
> > +
> > +   bus = devm_mdiobus_alloc(priv->dev);
> > +   if (!bus)
> > +   return -ENOMEM;
> > +
> > +   snprintf(bus->id, MII_BUS_ID_SIZE, "%s", dev_name(priv->dev));
> > +   bus->priv = priv;
> > +   bus->name = "SNI NETSEC MDIO";
> > +   bus->read = netsec_phy_read;
> > +   bus->write = netsec_phy_write;
> > +   bus->parent = priv->dev;
> > +   priv->mii_bus = bus;
> > +
> > +   if (dev_of_node(priv->dev)) {
> > +   ret = of_mdiobus_register(bus, dev_of_node(priv->dev));
> > +   if (ret) {
> > +   dev_err(priv->dev, "mdiobus register err(%d)\n", 
> > ret);
> > +   return ret;
> > +   }
> > +   } else {
> > +   /* Mask out all PHYs from auto probing. */
> > +   bus->phy_mask = ~0;
> > +   ret = mdiobus_register(bus);
> > +   if (ret) {
> > +   dev_err(priv->dev, "mdiobus register err(%d)\n", 
> > ret);
> > +   return ret;
> > +   }
> > +
> > +   priv->phydev = get_phy_device(priv->mii_bus, phy_addr, 
> > false);
> > +   if (IS_ERR(priv->phydev)) {
> > +   ret = PTR_ERR(priv->phydev);
> > +   dev_err(priv->dev, "get_phy_device err(%d)\n", ret);
> > +   priv->phydev = NULL;
> > +   return -ENODEV;
> > +   }
> > +
> > +   ret = phy_device_register(priv->phydev);
> > +   if (ret)
> > +   dev_err(priv->dev,
> > +   "phy_device_register err(%d)\n", ret);

You should unregister the mdio bus here.

> > +   }
> > +
> > +   return ret;
> > +}

  Andrew

Re: [PATCHv2 2/3] net: socionext: Add Synquacer NetSec driver

2017-12-12 Thread Andrew Lunn

> > +static int netsec_mac_update_to_phy_state(struct netsec_priv *priv)
> > +{
> > +   struct phy_device *phydev = priv->ndev->phydev;
> > +   u32 value = 0;
> > +
> > +   value = phydev->duplex ? NETSEC_GMAC_MCR_REG_FULL_DUPLEX_COMMON :
> > +NETSEC_GMAC_MCR_REG_HALF_DUPLEX_COMMON;
> > +
> > +   if (phydev->speed != SPEED_1000)
> > +   value |= NETSEC_MCR_PS;
> > +
> > +   if (priv->phy_interface != PHY_INTERFACE_MODE_GMII &&
> > +   phydev->speed == SPEED_100)
> > +   value |= NETSEC_GMAC_MCR_REG_FES;
> > +
> > +   value |= NETSEC_GMAC_MCR_REG_CST | NETSEC_GMAC_MCR_REG_JE;
> > +
> > +   if (priv->phy_interface == PHY_INTERFACE_MODE_RGMII)
> > +   value |= NETSEC_GMAC_MCR_REG_IBN;

phy_interface_mode_is_rgmii() ??

  Andrew

Re: [PATCH net-next] net: dsa: lan9303: Introduce lan9303_read_wait

2017-12-12 Thread Egil Hjelmeland


Hi Vivien.

Den 12. des. 2017 19:08, skrev Vivien Didelot:

Hi Egil,

Egil Hjelmeland  writes:


Simplify lan9303_indirect_phy_wait_for_completion()
and lan9303_switch_wait_for_completion() by using a new function
lan9303_read_wait()

Signed-off-by: Egil Hjelmeland 
---
  drivers/net/dsa/lan9303-core.c | 59 +++---
  1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index c1b004fa64d9..96ccce0939d3 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -249,6 +249,29 @@ static int lan9303_read(struct regmap *regmap, unsigned 
int offset, u32 *reg)
return -EIO;
  }
  
+/* Wait a while until mask & reg == value. Otherwise return timeout. */

+static int lan9303_read_wait(struct lan9303 *chip, int offset, int mask,
+char value)
+{
+   int i;
+
+   for (i = 0; i < 25; i++) {
+   u32 reg;
+   int ret;
+
+   ret = lan9303_read(chip->regmap, offset, );
+   if (ret) {
+   dev_err(chip->dev, "%s failed to read offset %d: %d\n",
+   __func__, offset, ret);
+   return ret;
+   }
+   if ((reg & mask) == value)
+   return 0;


That is weird to mix int, u32 and char for mask checking. I suggest you
to use the u32 type as well for both mask and value.



Good catch. Will fix that. Same with lan9303_csr_reg_wait() then.



Looking at how lan9303_read_wait is called, the value argument doesn't
seem necessary. You can directly return 0 if (!(reg & mask)).



The idea was to make in more general usable, in case one need to wait 
for a bit to be set. But I don't have any example from the datasheet 
that needs it, so I could take "value" away.



+   usleep_range(1000, 2000);
+   }
+   return -ETIMEDOUT;


A newline before the return statment would be appreciated.


Ok.


+}
+
  static int lan9303_virt_phy_reg_read(struct lan9303 *chip, int regnum)
  {
int ret;
@@ -274,22 +297,8 @@ static int lan9303_virt_phy_reg_write(struct lan9303 
*chip, int regnum, u16 val)
  
  static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip)

  {
-   int ret, i;
-   u32 reg;
-
-   for (i = 0; i < 25; i++) {
-   ret = lan9303_read(chip->regmap, LAN9303_PMI_ACCESS, );
-   if (ret) {
-   dev_err(chip->dev,
-   "Failed to read pmi access status: %d\n", ret);
-   return ret;
-   }
-   if (!(reg & LAN9303_PMI_ACCESS_MII_BUSY))
-   return 0;
-   usleep_range(1000, 2000);
-   }
-
-   return -EIO;
+   return lan9303_read_wait(chip, LAN9303_PMI_ACCESS,
+LAN9303_PMI_ACCESS_MII_BUSY, 0);
  }
  
  static int lan9303_indirect_phy_read(struct lan9303 *chip, int addr, int regnum)

@@ -366,22 +375,8 @@ EXPORT_SYMBOL_GPL(lan9303_indirect_phy_ops);
  
  static int lan9303_switch_wait_for_completion(struct lan9303 *chip)

  {
-   int ret, i;
-   u32 reg;
-
-   for (i = 0; i < 25; i++) {
-   ret = lan9303_read(chip->regmap, LAN9303_SWITCH_CSR_CMD, );
-   if (ret) {
-   dev_err(chip->dev,
-   "Failed to read csr command status: %d\n", ret);
-   return ret;
-   }
-   if (!(reg & LAN9303_SWITCH_CSR_CMD_BUSY))
-   return 0;
-   usleep_range(1000, 2000);
-   }
-
-   return -EIO;
+   return lan9303_read_wait(chip, LAN9303_SWITCH_CSR_CMD,
+LAN9303_SWITCH_CSR_CMD_BUSY, 0);
  }
  
  static int lan9303_write_switch_reg(struct lan9303 *chip, u16 regnum, u32 val)



Thanks,

 Vivien



Thanks, Egil

Re: KASAN: stack-out-of-bounds Read in xfrm_state_find (3)

2017-12-12 Thread Eric Biggers

Hi Steffen,

On Fri, Dec 01, 2017 at 08:27:43AM +0100, Steffen Klassert wrote:
> On Wed, Nov 22, 2017 at 08:05:00AM -0800, syzbot wrote:
> > syzkaller has found reproducer for the following crash on
> > 0c86a6bd85ff0629cd2c5141027fc1c8bb6cde9c
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> > 
> > 
> > BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x30fc/0x3230
> > net/xfrm/xfrm_state.c:1051
> > Read of size 4 at addr 8801ccaa7af8 by task syzkaller231684/3045
> 
> The patch below should fix this. I plan to apply it to the ipsec tree
> after some advanced testing.
> 
> Subject: [PATCH RFC] xfrm: Fix stack-out-of-bounds with misconfigured 
> transport
>  mode policies.
> 

Are you still planning to apply this?  syzbot is still hitting this bug.

Eric

[PATCH v3 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Ivan Khoronzhuk

It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
replacing them on dev_err msgs.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 12 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
 drivers/net/ethernet/ti/davinci_emac.c  | 11 +--
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a60a378..3c85a08 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
}
 
cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
+   if (IS_ERR(cpsw->txv[0].ch)) {
+   dev_err(priv->dev, "error initializing tx dma channel\n");
+   ret = PTR_ERR(cpsw->txv[0].ch);
+   goto clean_dma_ret;
+   }
+
cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
-   if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
-   dev_err(priv->dev, "error initializing dma channels\n");
-   ret = -ENOMEM;
+   if (IS_ERR(cpsw->rxv[0].ch)) {
+   dev_err(priv->dev, "error initializing rx dma channel\n");
+   ret = PTR_ERR(cpsw->rxv[0].ch);
goto clean_dma_ret;
}
 
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index e4d6edf..6f9173f 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
 
if (__chan_linear(chan_num) >= ctlr->num_chan)
-   return NULL;
+   return ERR_PTR(-EINVAL);
 
chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
if (!chan)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f58c0c6..abceea8 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1870,10 +1870,17 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
 
priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
 emac_tx_handler, 0);
+   if (IS_ERR(priv->txchan)) {
+   dev_err(>dev, "error initializing tx dma channel\n");
+   rc = PTR_ERR(priv->txchan);
+   goto no_cpdma_chan;
+   }
+
priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
 emac_rx_handler, 1);
-   if (WARN_ON(!priv->txchan || !priv->rxchan)) {
-   rc = -ENOMEM;
+   if (IS_ERR(priv->rxchan)) {
+   dev_err(>dev, "error initializing rx dma channel\n");
+   rc = PTR_ERR(priv->rxchan);
goto no_cpdma_chan;
}
 
-- 
2.7.4

[PATH net-next] tcp: pause Fast Open globally after third consecutive timeout

2017-12-12 Thread Yuchung Cheng

Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.

This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout.  Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.

Reported-by: Dragana Damjanovic 
Reported-by: Patrick McManus 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Wei Wang 
Reviewed-by: Neal Cardwell 
Reviewed-by: Eric Dumazet 
---
 Documentation/networking/ip-sysctl.txt |  1 +
 include/net/tcp.h  |  5 ++---
 net/ipv4/tcp_fastopen.c| 30 --
 net/ipv4/tcp_metrics.c |  5 +
 net/ipv4/tcp_timer.c   | 17 +
 5 files changed, 25 insertions(+), 33 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 46c7e1085efc..3f2c40d8e6aa 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -606,6 +606,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER
This time period will grow exponentially when more blackhole issues
get detected right after Fastopen is re-enabled and will reset to
initial value when the blackhole issue goes away.
+   0 to disable the blackhole detection.
By default, it is set to 1hr.
 
 tcp_syn_retries - INTEGER
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3c3744e52cd1..6939e69d3c37 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1507,8 +1507,7 @@ int tcp_md5_hash_key(struct tcp_md5sig_pool *hp,
 
 /* From tcp_fastopen.c */
 void tcp_fastopen_cache_get(struct sock *sk, u16 *mss,
-   struct tcp_fastopen_cookie *cookie, int *syn_loss,
-   unsigned long *last_syn_loss);
+   struct tcp_fastopen_cookie *cookie);
 void tcp_fastopen_cache_set(struct sock *sk, u16 mss,
struct tcp_fastopen_cookie *cookie, bool syn_lost,
u16 try_exp);
@@ -1546,7 +1545,7 @@ extern unsigned int sysctl_tcp_fastopen_blackhole_timeout;
 void tcp_fastopen_active_disable(struct sock *sk);
 bool tcp_fastopen_active_should_disable(struct sock *sk);
 void tcp_fastopen_active_disable_ofo_check(struct sock *sk);
-void tcp_fastopen_active_timeout_reset(void);
+void tcp_fastopen_active_detect_blackhole(struct sock *sk, bool expired);
 
 /* Latencies incurred by various limits for a sender. They are
  * chronograph-like stats that are mutually exclusive.
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 78c192ee03a4..018a48477355 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -379,18 +379,9 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct 
sk_buff *skb,
 bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
   struct tcp_fastopen_cookie *cookie)
 {
-   unsigned long last_syn_loss = 0;
const struct dst_entry *dst;
-   int syn_loss = 0;
 
-   tcp_fastopen_cache_get(sk, mss, cookie, _loss, _syn_loss);
-
-   /* Recurring FO SYN losses: no cookie or data in SYN */
-   if (syn_loss > 1 &&
-   time_before(jiffies, last_syn_loss + (60*HZ << syn_loss))) {
-   cookie->len = -1;
-   return false;
-   }
+   tcp_fastopen_cache_get(sk, mss, cookie);
 
/* Firewall blackhole issue check */
if (tcp_fastopen_active_should_disable(sk)) {
@@ -448,6 +439,8 @@ EXPORT_SYMBOL(tcp_fastopen_defer_connect);
  * following circumstances:
  *   1. client side TFO socket receives out of order FIN
  *   2. client side TFO socket receives out of order RST
+ *   3. client side TFO socket has timed out three times consecutively during
+ *  or after handshake
  * We disable active side TFO globally for 1hr at first. Then if it
  * happens again, we disable it for 2h, then 4h, 8h, ...
  * And we reset the timeout back to 1hr when we see a successful active
@@ -524,3 +517,20 @@ void tcp_fastopen_active_disable_ofo_check(struct sock *sk)
dst_release(dst);
}
 }
+
+void tcp_fastopen_active_detect_blackhole(struct sock *sk, bool expired)
+{
+   u32 timeouts =

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Cong Wang

On Tue, Dec 12, 2017 at 8:13 AM, Ed Swierk  wrote:
> Most physical Ethernet devices pad short packets to the minimum length
> of 64 bytes (including FCS) on transmit. It can be useful to simulate
> this behavior when debugging a problem that results from it (such as
> incorrect L4 checksum calculation).
>
> Padding is unnecessary for most applications so leave it off by
> default. Enable padding only when the otherwise unused IFF_AUTOMEDIA
> flag is set (e.g. by writing 0x5003 to flags in sysfs).

This doesn't make sense, why should veth hot path be punished for
such an unusual flag which it doesn't care? Also, why should we allow
setting this flag via sysfs for veth from the beginning?

Re: [REGRESSION][4.13.y][4.14.y][v4.15.y] net: reduce skb_warn_bad_offload() noise

2017-12-12 Thread Greg KH

On Tue, Dec 12, 2017 at 09:10:11AM -0500, David Miller wrote:
> From: Willem de Bruijn 
> Date: Mon, 11 Dec 2017 16:56:56 -0500
> 
> > On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman
> >  wrote:
> >> On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
> >>> Note that UFO was removed in 4.14 and that skb_warn_bad_offload
> >>> can happen for various types of packets, so there may be multiple
> >>> independent bug reports. I'm investigating two other non-UFO reports
> >>> just now.
> >>
> >> Meta-comment, now that UFO is gone from mainline, I'm wondering if I
> >> should just delete it from 4.4 and 4.9 as well.  Any objections for
> >> that?  I'd like to make it easy to maintain these kernels for a while,
> >> and having them diverge like this, with all of the issues around UFO,
> >> seems like it will just make life harder for myself if I leave it in.
> >>
> >> Any opinions?
> > 
> > Some of that removal had to be reverted with commit 0c19f846d582
> > ("net: accept UFO datagrams from tuntap and packet") for VM live
> > migration between kernels.
> > 
> > Any backports probably should squash that in at the least. Just today
> > another thread discussed that that patch may not address all open
> > issues still, so it may be premature to backport at this point.
> > http://lkml.kernel.org/r/
> 
> I would probably discourage backporting the UFO removal, at least for
> now.

Ok, thanks for letting me know, I'll ask again in 6 months or so :)

greg k-h

Re: [PATCH 2/4] sctp: Add ip option support

2017-12-12 Thread Paul Moore

On Tue, Dec 12, 2017 at 11:08 AM, Marcelo Ricardo Leitner
 wrote:
> Hi Richard,
>
> On Mon, Nov 27, 2017 at 07:31:21PM +, Richard Haines wrote:
> ...
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -3123,8 +3123,10 @@ static int sctp_setsockopt_maxseg(struct sock *sk, 
>> char __user *optval, unsigned
>>
>>   if (asoc) {
>>   if (val == 0) {
>> + struct sctp_af *af = sp->pf->af;
>>   val = asoc->pathmtu;
>> - val -= sp->pf->af->net_header_len;
>> + val -= af->ip_options_len(asoc->base.sk);
>> + val -= af->net_header_len;
>>   val -= sizeof(struct sctphdr) +
>>   sizeof(struct sctp_data_chunk);
>>   }
>
> Right below here there is a call to sctp_frag_point(). That function
> also needs this tweak.
>
> Yes, we should simplify all these calculations. I have a patch to use
> sctp_frag_point on where it is currently recalculating it on
> sctp_datamsg_from_user(), but probably should include other places as
> well.

FYI: Richard let me know he is occupied with another project at the
moment and likely won't be able to do another respin until next week
at the earliest.

-- 
paul moore
www.paul-moore.com

Re: [PATCH v3 net-next] net: ethernet: ti: cpdma: correct error handling for chan create

2017-12-12 Thread Grygorii Strashko




On 12/12/2017 03:06 PM, Ivan Khoronzhuk wrote:

It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
replacing them on dev_err msgs.

Signed-off-by: Ivan Khoronzhuk 


Reviewed-by: Grygorii Strashko 


---
  drivers/net/ethernet/ti/cpsw.c  | 12 +---
  drivers/net/ethernet/ti/davinci_cpdma.c |  2 +-
  drivers/net/ethernet/ti/davinci_emac.c  | 11 +--
  3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a60a378..3c85a08 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3065,10 +3065,16 @@ static int cpsw_probe(struct platform_device *pdev)
}
  
  	cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);

+   if (IS_ERR(cpsw->txv[0].ch)) {
+   dev_err(priv->dev, "error initializing tx dma channel\n");
+   ret = PTR_ERR(cpsw->txv[0].ch);
+   goto clean_dma_ret;
+   }
+
cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
-   if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
-   dev_err(priv->dev, "error initializing dma channels\n");
-   ret = -ENOMEM;
+   if (IS_ERR(cpsw->rxv[0].ch)) {
+   dev_err(priv->dev, "error initializing rx dma channel\n");
+   ret = PTR_ERR(cpsw->rxv[0].ch);
goto clean_dma_ret;
}
  
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c

index e4d6edf..6f9173f 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -893,7 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
  
  	if (__chan_linear(chan_num) >= ctlr->num_chan)

-   return NULL;
+   return ERR_PTR(-EINVAL);
  
  	chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);

if (!chan)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f58c0c6..abceea8 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1870,10 +1870,17 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
  
  	priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,

 emac_tx_handler, 0);
+   if (IS_ERR(priv->txchan)) {
+   dev_err(>dev, "error initializing tx dma channel\n");
+   rc = PTR_ERR(priv->txchan);
+   goto no_cpdma_chan;
+   }
+
priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
 emac_rx_handler, 1);
-   if (WARN_ON(!priv->txchan || !priv->rxchan)) {
-   rc = -ENOMEM;
+   if (IS_ERR(priv->rxchan)) {
+   dev_err(>dev, "error initializing rx dma channel\n");
+   rc = PTR_ERR(priv->rxchan);
goto no_cpdma_chan;
}
  



--
regards,
-grygorii

[PATCH v2] [iproute] Show 'external' link mode in output

2017-12-12 Thread Phil Dibowitz

Recently `external` support was added to the tunnel drivers, but there is no way
to introspect this from userspace. This adds support for that.

Now `ip -details link` shows it:

```
7: tunl60@NONE:  mtu 1452 qdisc noop state DOWN mode DEFAULT group
default qlen 1
link/tunnel6 :: brd :: promiscuity 0
ip6tnl external any remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 
flowlabel 0x0 (flowinfo 0x) addrgenmode eui64 numtxqueues 1 
numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```

Signed-off-by: Phil Dibowitz 
---
 ip/link_ip6tnl.c | 3 +++
 ip/link_iptnl.c  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 43287ab..af796c3 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -345,6 +345,9 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb
if (!tb)
return;
 
+   if (tb[IFLA_IPTUN_COLLECT_METADATA])
+   print_bool(PRINT_ANY, "external", "external ", true);
+
if (tb[IFLA_IPTUN_FLAGS])
flags = rta_getattr_u32(tb[IFLA_IPTUN_FLAGS]);
 
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 4940b8b..2804b8f 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -393,6 +393,9 @@ static void iptunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[
if (!tb)
return;
 
+   if (tb[IFLA_IPTUN_COLLECT_METADATA])
+   print_bool(PRINT_ANY, "external", "external ", true);
+
if (tb[IFLA_IPTUN_REMOTE]) {
unsigned int addr = rta_getattr_u32(tb[IFLA_IPTUN_REMOTE]);
 
-- 
2.9.5

Re: [PATCH 2/4] sctp: Add ip option support

2017-12-12 Thread Marcelo Ricardo Leitner

On Tue, Dec 12, 2017 at 04:33:03PM -0500, Paul Moore wrote:
> On Tue, Dec 12, 2017 at 11:08 AM, Marcelo Ricardo Leitner
>  wrote:
> > Hi Richard,
> >
> > On Mon, Nov 27, 2017 at 07:31:21PM +, Richard Haines wrote:
> > ...
> >> --- a/net/sctp/socket.c
> >> +++ b/net/sctp/socket.c
> >> @@ -3123,8 +3123,10 @@ static int sctp_setsockopt_maxseg(struct sock *sk, 
> >> char __user *optval, unsigned
> >>
> >>   if (asoc) {
> >>   if (val == 0) {
> >> + struct sctp_af *af = sp->pf->af;
> >>   val = asoc->pathmtu;
> >> - val -= sp->pf->af->net_header_len;
> >> + val -= af->ip_options_len(asoc->base.sk);
> >> + val -= af->net_header_len;
> >>   val -= sizeof(struct sctphdr) +
> >>   sizeof(struct sctp_data_chunk);
> >>   }
> >
> > Right below here there is a call to sctp_frag_point(). That function
> > also needs this tweak.
> >
> > Yes, we should simplify all these calculations. I have a patch to use
> > sctp_frag_point on where it is currently recalculating it on
> > sctp_datamsg_from_user(), but probably should include other places as
> > well.
> 
> FYI: Richard let me know he is occupied with another project at the
> moment and likely won't be able to do another respin until next week
> at the earliest.

Okay, thanks. I can do a follow-up patch if it helps.

  Marcelo

[PATCH] Bluetooth: hci_ll: Add optional nvmem BD address source

2017-12-12 Thread David Lechner

This adds an optional nvmem consumer to get a BD address from an external
source. The BD address is then set in the Bluetooth chip after the
firmware has been loaded.

This has been tested working with a TI CC2560A chip (in a LEGO MINDSTORMS
EV3).

Signed-off-by: David Lechner 
---
 drivers/bluetooth/hci_ll.c | 61 ++
 1 file changed, 61 insertions(+)

diff --git a/drivers/bluetooth/hci_ll.c b/drivers/bluetooth/hci_ll.c
index c948e8d..ed042ae 100644
--- a/drivers/bluetooth/hci_ll.c
+++ b/drivers/bluetooth/hci_ll.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "hci_uart.h"
 
@@ -90,6 +91,7 @@ struct ll_device {
struct serdev_device *serdev;
struct gpio_desc *enable_gpio;
struct clk *ext_clk;
+   bdaddr_t bdaddr;
 };
 
 struct ll_struct {
@@ -719,6 +721,18 @@ static int ll_setup(struct hci_uart *hu)
if (err)
return err;
 
+   /* Set BD address if one was specified at probe */
+   if (!bacmp(>bdaddr, BDADDR_NONE)) {
+   /* This means that there was an error getting the BD address
+* during probe, so mark the device as having a bad address.
+*/
+   set_bit(HCI_QUIRK_INVALID_BDADDR, >hdev->quirks);
+   } else if (bacmp(>bdaddr, BDADDR_ANY)) {
+   err = ll_set_bdaddr(hu->hdev, >bdaddr);
+   if (err)
+   set_bit(HCI_QUIRK_INVALID_BDADDR, >hdev->quirks);
+   }
+
/* Operational speed if any */
if (hu->oper_speed)
speed = hu->oper_speed;
@@ -749,6 +763,7 @@ static int hci_ti_probe(struct serdev_device *serdev)
 {
struct hci_uart *hu;
struct ll_device *lldev;
+   struct nvmem_cell *bdaddr_cell;
u32 max_speed = 300;
 
lldev = devm_kzalloc(>dev, sizeof(struct ll_device), 
GFP_KERNEL);
@@ -770,6 +785,52 @@ static int hci_ti_probe(struct serdev_device *serdev)
of_property_read_u32(serdev->dev.of_node, "max-speed", _speed);
hci_uart_set_speeds(hu, 115200, max_speed);
 
+   /* optional BD address from nvram */
+   bdaddr_cell = nvmem_cell_get(>dev, "bd-address");
+   if (IS_ERR(bdaddr_cell)) {
+   int err = PTR_ERR(bdaddr_cell);
+
+   if (err == -EPROBE_DEFER)
+   return err;
+
+   /* ENOENT means there is no matching nvmem cell and ENOSYS
+* means that nvmem is not enabled in the kernel configuration.
+*/
+   if (err != -ENOENT && err != -ENOSYS) {
+   /* If there was some other error, give userspace a
+* chance to fix the problem instead of failing to load
+* the driver. Using BDADDR_NONE as a flag that is
+* tested later in the setup function.
+*/
+   dev_warn(>dev,
+"Failed to get \"bd-address\" nvmem cell 
(%d)\n",
+err);
+   bacpy(>bdaddr, BDADDR_NONE);
+   }
+   } else {
+   bdaddr_t *bdaddr;
+   int len;
+
+   bdaddr = nvmem_cell_read(bdaddr_cell, );
+   nvmem_cell_put(bdaddr_cell);
+   if (IS_ERR(bdaddr)) {
+   dev_err(>dev, "Failed to read nvmem 
bd-address\n");
+   return PTR_ERR(bdaddr);
+   }
+   if (len != sizeof(bdaddr_t)) {
+   dev_err(>dev, "Invalid nvmem bd-address 
length\n");
+   kfree(bdaddr);
+   return -EINVAL;
+   }
+
+   /* As per the device tree bindings, the value from nvmem is
+* expected to be MSB first, but in the kernel it is expected
+* that bdaddr_t is LSB first.
+*/
+   baswap(>bdaddr, bdaddr);
+   kfree(bdaddr);
+   }
+
return hci_uart_register_device(hu, );
 }
 
-- 
2.7.4

[PATCH v3 1/3] Bluetooth: hci_ll: add support for setting public address

2017-12-12 Thread David Lechner

This adds support for setting the public address on Texas Instruments
Bluetooth chips using a vendor-specific command.

This has been tested on a CC2560A chip. The TI wiki also indicates that
this command should work on TI WL17xx/WL18xx Bluetooth chips.

During review, there was some question as to the correctness of the byte
swapping since TI's documentation is not clear on this matter. This can
be tested with the btmgmt utility from bluez. The adapter must be powered
off to change the address. If the baswap() is omitted, address is reversed.

In case there is a issue in the future, here is the output of btmon during
the command `btmgmt public-addr 00:11:22:33:44:55`:

Bluetooth monitor ver 5.43
= Note: Linux version 4.15.0-rc2-08561-gcb132a1-dirty (armv5tejl)  0.707043
= Note: Bluetooth subsystem version 2.22   0.707091
= New Index: 00:17:E7:BD:1C:8E (Primary,UART,hci0)  [hci0] 0.707106
@ MGMT Open: btmgmt (privileged) version 1.14 {0x0002} 0.707124
@ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.707137
@ MGMT Open: btmon (privileged) version 1.14  {0x0003} 0.707540
@ MGMT Command: Set Public Address (0x0039) plen 6{0x0002} [hci0] 11.167991
Address: 00:11:22:33:44:55 (CIMSYS Inc)
@ MGMT Event: Command Complete (0x0001) plen 7{0x0002} [hci0] 11.175681
  Set Public Address (0x0039) plen 4
Status: Success (0x00)
Missing options: 0x
@ MGMT Event: Index Removed (0x0005) plen 0   {0x0003} [hci0] 11.175757
@ MGMT Event: Index Removed (0x0005) plen 0   {0x0002} [hci0] 11.175757
@ MGMT Event: Index Removed (0x0005) plen 0   {0x0001} [hci0] 11.175757
= Open Index: 00:17:E7:BD:1C:8E[hci0] 11.176807
< HCI Command: Vendor (0x3f|0x0006) plen 6 [hci0] 11.176975
00 11 22 33 44 55.."3DU
> HCI Event: Command Complete (0x0e) plen 4[hci0] 11.188260
  Vendor (0x3f|0x0006) ncmd 1
Status: Success (0x00)
...
< HCI Command: Read Local Version Info.. (0x04|0x0001) plen 0  [hci0] 11.189859
> HCI Event: Command Complete (0x0e) plen 12   [hci0] 11.190732
  Read Local Version Information (0x04|0x0001) ncmd 1
Status: Success (0x00)
HCI version: Bluetooth 2.1 (0x04) - Revision 0 (0x)
LMP version: Bluetooth 2.1 (0x04) - Subversion 6431 (0x191f)
Manufacturer: Texas Instruments Inc. (13)
< HCI Command: Read BD ADDR (0x04|0x0009) plen 0   [hci0] 11.191027
> HCI Event: Command Complete (0x0e) plen 10   [hci0] 11.192101
  Read BD ADDR (0x04|0x0009) ncmd 1
Status: Success (0x00)
Address: 00:11:22:33:44:55 (CIMSYS Inc)
...

Signed-off-by: David Lechner 
---
 drivers/bluetooth/hci_ll.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/bluetooth/hci_ll.c b/drivers/bluetooth/hci_ll.c
index efcfbe9..c948e8d 100644
--- a/drivers/bluetooth/hci_ll.c
+++ b/drivers/bluetooth/hci_ll.c
@@ -57,6 +57,7 @@
 #include "hci_uart.h"
 
 /* Vendor-specific HCI commands */
+#define HCI_VS_WRITE_BD_ADDR   0xfc06
 #define HCI_VS_UPDATE_UART_HCI_BAUDRATE0xff36
 
 /* HCILL commands */
@@ -662,6 +663,24 @@ static int download_firmware(struct ll_device *lldev)
return err;
 }
 
+static int ll_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr)
+{
+   bdaddr_t bdaddr_swapped;
+   struct sk_buff *skb;
+
+   /* HCI_VS_WRITE_BD_ADDR (at least on a CC2560A chip) expects the BD
+* address to be MSB first, but bdaddr_t has the convention of being
+* LSB first.
+*/
+   baswap(_swapped, bdaddr);
+   skb = __hci_cmd_sync(hdev, HCI_VS_WRITE_BD_ADDR, sizeof(bdaddr_t),
+_swapped, HCI_INIT_TIMEOUT);
+   if (!IS_ERR(skb))
+   kfree_skb(skb);
+
+   return PTR_ERR_OR_ZERO(skb);
+}
+
 static int ll_setup(struct hci_uart *hu)
 {
int err, retry = 3;
@@ -674,6 +693,8 @@ static int ll_setup(struct hci_uart *hu)
 
lldev = serdev_device_get_drvdata(serdev);
 
+   hu->hdev->set_bdaddr = ll_set_bdaddr;
+
serdev_device_set_flow_control(serdev, true);
 
do {
-- 
2.7.4

[PATCH v3 2/3] dt-bindings: Add optional nvmem BD address bindings to ti,wlink-st

2017-12-12 Thread David Lechner

This adds optional nvmem consumer properties to the ti,wlink-st device tree
bindings to allow specifying the BD address.

Reviewed-by: Rob Herring 
Signed-off-by: David Lechner 
---
 Documentation/devicetree/bindings/net/ti,wilink-st.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/ti,wilink-st.txt 
b/Documentation/devicetree/bindings/net/ti,wilink-st.txt
index 1649c1f..a45a508 100644
--- a/Documentation/devicetree/bindings/net/ti,wilink-st.txt
+++ b/Documentation/devicetree/bindings/net/ti,wilink-st.txt
@@ -32,6 +32,9 @@ Optional properties:
See ../clocks/clock-bindings.txt for details.
  - clock-names : Must include the following entry:
"ext_clock" (External clock provided to the TI combo chip).
+ - nvmem-cells: phandle to nvmem data cell that contains a 6 byte BD address
+   with the most significant byte first (big-endian).
+ - nvmem-cell-names: "bd-address" (required when nvmem-cells is specified)
 
 Example:
 
@@ -43,5 +46,7 @@ Example:
enable-gpios = < 7 GPIO_ACTIVE_HIGH>;
clocks = <_wl18xx>;
clock-names = "ext_clock";
+   nvmem-cells = <_address>;
+   nvmem-cell-names = "bd-address";
};
 };
-- 
2.7.4

Re: [PATCH] veth: Optionally pad packets to minimum Ethernet length

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 11:00:38 -0800
Ed Swierk  wrote:

> On Tue, Dec 12, 2017 at 10:34 AM, Stephen Hemminger
>  wrote:
> > Why not add to netdevsim rather than cluttering up a normal driver
> > with test support.  We just pulled a bunch of test stuff out of dummy
> > for the same reason.  
> 
> My test setup to trigger an openvswitch conntrack issue
> (https://marc.info/?l=linux-netdev=151309548725627) involves a lot
> of moving parts:
> 
> [netns-a: vetha1] - [vetha0] - [ovsbr0] - [vethb0] - [netns-b: vethb1]
> 
> with nc client and server in netns-a and -b, and tweaks like turning
> off tcp_timestamps to make sure the packets in the TCP stream are
> small enough to reproduce the problem. A simpler, less fragile test
> setup would be valuable, especially if it ends up as an automated
> regression test.
> 
> Could netdevsim be useful for that? Are there any existing tests
> producing TCP traffic that might serve as an example?
> 
> --Ed

Maybe add a netem impairment that does padding?

[PATCH v3 0/3] Bluetooth: hci_ll: Get BD address from NVMEM

2017-12-12 Thread David Lechner

This series adds supporting getting the BD address from a NVMEM provider
for "LL" HCI controllers (Texas Instruments).

v3 changes:
* Additional comments on why swapping bytes is needed.
* Fixed comment style and trailing whitespace.
* Rework error handling for nvmem cell code.

v2 changes:
* Fixed typos in dt-bindings
* Use "bd-address" instead of "mac-address"
* Updated dt-bindings to specify the byte order of "bd-address"
* New patch "Bluetooth: hci_ll: add support for setting public address"
* Dropped patch "Bluetooth: hci_ll: add constant for vendor-specific command"
  that is already in bluetooth-next
* Rework error handling
* Use bdaddr_t, bacmp and other bluetooth utils

David Lechner (3):
  Bluetooth: hci_ll: add support for setting public address
  dt-bindings: Add optional nvmem BD address bindings to ti,wlink-st
  Bluetooth: hci_ll: Add optional nvmem BD address source

 .../devicetree/bindings/net/ti,wilink-st.txt   |  5 ++
 drivers/bluetooth/hci_ll.c | 77 ++
 2 files changed, 82 insertions(+)

-- 
2.7.4

Re: [net-next PATCH 14/14] net: sched: pfifo_fast use skb_array

2017-12-12 Thread Cong Wang

On Thu, Dec 7, 2017 at 9:58 AM, John Fastabend  wrote:
> This converts the pfifo_fast qdisc to use the skb_array data structure
> and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
> the lockless bit that can be a child of a qdisc requiring locking. So
> we add logic to clear the lock bit on initialization in these cases when
> the qdisc graft operation occurs.
>
> This also removes the logic used to pick the next band to dequeue from
> and instead just checks a per priority array for packets from top priority
> to lowest. This might need to be a bit more clever but seems to work
> for now.

A very dumb question:

Why do we call it lockless? With skb_array you just shift the per qdisc
spinlock down to each pfifo band, skb_array still uses a spinlock underneath.

What am I missing here?

[PATCH net] bpf: add schedule points to map alloc/free

2017-12-12 Thread Eric Dumazet

From: Eric Dumazet 

While using large percpu maps, htab_map_alloc() can hold
cpu for hundreds of ms.

This patch adds cond_resched() calls to percpu alloc/free
call sites, all running in process context.

Signed-off-by: Eric Dumazet 
---
 kernel/bpf/hashtab.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 
e469e05c8e83bc3256378644e3f3c26555651261..3905d4bc5b80d74f0b8f9e2e8f8526a0115ce239
 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -114,6 +114,7 @@ static void htab_free_elems(struct bpf_htab *htab)
pptr = htab_elem_get_ptr(get_htab_elem(htab, i),
 htab->map.key_size);
free_percpu(pptr);
+   cond_resched();
}
 free_elems:
bpf_map_area_free(htab->elems);
@@ -159,6 +160,7 @@ static int prealloc_init(struct bpf_htab *htab)
goto free_elems;
htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size,
  pptr);
+   cond_resched();
}
 
 skip_percpu_elems:

Re: [PATCH 2/4] sctp: Add ip option support

2017-12-12 Thread Paul Moore

On Tue, Dec 12, 2017 at 4:56 PM, Marcelo Ricardo Leitner
 wrote:
> On Tue, Dec 12, 2017 at 04:33:03PM -0500, Paul Moore wrote:
>> On Tue, Dec 12, 2017 at 11:08 AM, Marcelo Ricardo Leitner
>>  wrote:
>> > Hi Richard,
>> >
>> > On Mon, Nov 27, 2017 at 07:31:21PM +, Richard Haines wrote:
>> > ...
>> >> --- a/net/sctp/socket.c
>> >> +++ b/net/sctp/socket.c
>> >> @@ -3123,8 +3123,10 @@ static int sctp_setsockopt_maxseg(struct sock *sk, 
>> >> char __user *optval, unsigned
>> >>
>> >>   if (asoc) {
>> >>   if (val == 0) {
>> >> + struct sctp_af *af = sp->pf->af;
>> >>   val = asoc->pathmtu;
>> >> - val -= sp->pf->af->net_header_len;
>> >> + val -= af->ip_options_len(asoc->base.sk);
>> >> + val -= af->net_header_len;
>> >>   val -= sizeof(struct sctphdr) +
>> >>   sizeof(struct sctp_data_chunk);
>> >>   }
>> >
>> > Right below here there is a call to sctp_frag_point(). That function
>> > also needs this tweak.
>> >
>> > Yes, we should simplify all these calculations. I have a patch to use
>> > sctp_frag_point on where it is currently recalculating it on
>> > sctp_datamsg_from_user(), but probably should include other places as
>> > well.
>>
>> FYI: Richard let me know he is occupied with another project at the
>> moment and likely won't be able to do another respin until next week
>> at the earliest.
>
> Okay, thanks. I can do a follow-up patch if it helps.

I'll leave that up to you, I think your comments are pretty
straightforward and should be easy for Richard to incorporate, and
there is a lot to be said for including the fix in the original patch,
but if you would prefer to send a separate patch I think that's fine
too.

-- 
paul moore
www.paul-moore.com

[PATCH iproute2] iplink: validate maximum gso_max_size

2017-12-12 Thread Solio Sarabia

Validate the upper limit for gso_max_size, valid range is [0-65,536]
inclusive. Fix minor whitespace in iplink man page.

Signed-off-by: Solio Sarabia 
---
 ip/iplink.c   | 3 ++-
 man/man8/ip-link.8.in | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 6379b16..62bf713 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -853,7 +853,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
unsigned int max_size;
 
NEXT_ARG();
-   if (get_unsigned(_size, *argv, 0) || max_size > 
UINT16_MAX)
+   if (get_unsigned(_size, *argv, 0) ||
+   max_size > UINT16_MAX + 1)
invarg("Invalid \"gso_max_size\" value\n",
   *argv);
addattr32(>n, sizeof(*req), IFLA_GSO_MAX_SIZE, 
max_size);
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 0db2582..40f09b3 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -36,7 +36,7 @@ ip-link \- network device configuration
 .RB "[ " numrxqueues
 .IR QUEUE_COUNT " ]"
 .br
-.BR "[" gso_max_size
+.BR "[ " gso_max_size
 .IR BYTES " ]"
 .RB "[ " gso_max_segs
 .IR SEGMENTS " ]"
-- 
2.7.4

Re: [PATCH V3 net-next 3/8] net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support

2017-12-12 Thread Philippe Ombredanne

Dear Salil,

On Tue, Dec 12, 2017 at 6:52 PM, Salil Mehta  wrote:
> This patch adds the support of hardware compatibiltiy layer to the
> HNS3 VF Driver. This layer implements various {set|get} operations
> over MAC address for a virtual port, RSS related configuration,
> fetches the link status info from PF, does various VLAN related
> configuration over the virtual port, queries the statistics from
> the hardware etc.
>
> This layer can directly interact with hardware through the
> IMP(Integrated Mangement Processor) interface or can use mailbox
> to interact with the PF driver.
>
> Signed-off-by: Salil Mehta 
> Signed-off-by: lipeng 
> ---
> Patch V3: Addressed SPDX change requested by Philippe Ombredanne
>   Link: https://lkml.org/lkml/2017/12/8/874
> Patch V2: Addressed some internal comments
> Patch V1: Initial Submit
> ---
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 1490 
> 
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  164 +++
>  2 files changed, 1654 insertions(+)
>  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
>  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
> b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> new file mode 100644
> index 000..ff55f4c
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> @@ -0,0 +1,1490 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (c) 2016-2017 Hisilicon Limited.
> + */

This is just me nitpicking and this is entirely up to you but in
such a simple case you could go all the way too:

> +// SPDX-License-Identifier: GPL-2.0+
> +// Copyright (c) 2016-2017 Hisilicon Limited.

In this case this can make the thing look more consistent.
See also Linus commentaries about this [1][2][3][4]

[1] https://lkml.org/lkml/2017/11/25/133
[2] https://lkml.org/lkml/2017/11/25/125
[3] https://lkml.org/lkml/2017/11/2/715
[4] https://lkml.org/lkml/2017/11/2/805

-- 
Cordially
Philippe Ombredanne

Re: [PATCH] Fix handling of verdicts after NF_QUEUE

2017-12-12 Thread Pablo Neira Ayuso

On Tue, Dec 12, 2017 at 12:36:35AM +, Banerjee, Debabrata wrote:
> > From: Pablo Neira Ayuso [mailto:pa...@netfilter.org]
> > On Mon, Dec 11, 2017 at 06:30:24PM -0500, Debabrata Banerjee wrote:
> > > + } else {
> > > + /* Implicit handling for NF_STOLEN, as well as any other
> > > +  * non conventional verdicts.
> > > +  */
> > > + ret = 0;
> > 
> > Another possibility (more simple?) would be this:
> > 
> > int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state) {
> > struct nf_hook_entry *entry;
> > unsigned int verdict;
> > -   int ret = 0;
> > +   int ret;
> > 
> > entry = rcu_dereference(state->hook_entries);
> > next_hook:
> > +   ret = 0;
> > 
> > Basically, make sure ret is set to zero when jumping to the next_hook label.
> 
> Many ways to fix it, but I thought including the comment was appropriate.
> Happy to change it if we want simpler instead.

OK, let's take this one.

Please, send a patch in git-format-patch, that we can pass to -stable.

Cc netfilter-de...@vger.kernel.org and sta...@vger.kernel.org should
be fine, you can also include gre...@linuxfoundation.org since he
maintains 4.9-stable.

I'll ack this by when you send it.

Thanks!

[PATCH] net: thunderx: add support for rgmii internal delay

2017-12-12 Thread Tim Harvey

The XCV_DLL_CTL is being configured with the assumption that
phy-mode is rgmii-txid (PHY_INTERFACE_MODE_RGMII_TXID) which is not always
the case.

This patch parses the phy-mode property and uses it to configure CXV_DLL_CTL
properly.

Signed-off-by: Tim Harvey 
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 13 +++---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.h |  2 +-
 drivers/net/ethernet/cavium/thunder/thunder_xcv.c | 31 ++-
 3 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 5e5c4d7..805c02a 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -55,6 +55,7 @@ struct bgx {
struct pci_dev  *pdev;
boolis_dlm;
boolis_rgx;
+   int phy_mode;
 };
 
 static struct bgx *bgx_vnic[MAX_BGX_THUNDER];
@@ -841,12 +842,12 @@ static void bgx_poll_for_link(struct work_struct *work)
queue_delayed_work(lmac->check_link, >dwork, HZ * 2);
 }
 
-static int phy_interface_mode(u8 lmac_type)
+static int phy_interface_mode(struct bgx *bgx, u8 lmac_type)
 {
if (lmac_type == BGX_MODE_QSGMII)
return PHY_INTERFACE_MODE_QSGMII;
if (lmac_type == BGX_MODE_RGMII)
-   return PHY_INTERFACE_MODE_RGMII;
+   return bgx->phy_mode;
 
return PHY_INTERFACE_MODE_SGMII;
 }
@@ -912,7 +913,8 @@ static int bgx_lmac_enable(struct bgx *bgx, u8 lmacid)
 
if (phy_connect_direct(>netdev, lmac->phydev,
   bgx_lmac_handler,
-  phy_interface_mode(lmac->lmac_type)))
+  phy_interface_mode(bgx,
+ lmac->lmac_type)))
return -ENODEV;
 
phy_start_aneg(lmac->phydev);
@@ -1287,6 +1289,8 @@ static int bgx_init_of_phy(struct bgx *bgx)
bgx->lmac[lmac].lmacid = lmac;
 
phy_np = of_parse_phandle(node, "phy-handle", 0);
+   if (phy_np)
+   bgx->phy_mode = of_get_phy_mode(phy_np);
/* If there is no phy or defective firmware presents
 * this cortina phy, for which there is no driver
 * support, ignore it.
@@ -1390,7 +1394,6 @@ static int bgx_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
bgx->max_lmac = 1;
bgx->bgx_id = MAX_BGX_PER_CN81XX - 1;
bgx_vnic[bgx->bgx_id] = bgx;
-   xcv_init_hw();
}
 
/* On 81xx all are DLMs and on 83xx there are 3 BGX QLMs and one
@@ -1407,6 +1410,8 @@ static int bgx_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (err)
goto err_enable;
 
+   if (bgx->is_rgx)
+   xcv_init_hw(bgx->phy_mode);
bgx_init_hw(bgx);
 
/* Enable all LMACs */
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
index 23acdc5..2bba9d1 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
@@ -226,7 +226,7 @@ void bgx_lmac_internal_loopback(int node, int bgx_idx,
 void bgx_lmac_get_pfc(int node, int bgx_idx, int lmacid, void *pause);
 void bgx_lmac_set_pfc(int node, int bgx_idx, int lmacid, void *pause);
 
-void xcv_init_hw(void);
+void xcv_init_hw(int phy_mode);
 void xcv_setup_link(bool link_up, int link_speed);
 
 u64 bgx_get_rx_stats(int node, int bgx_idx, int lmac, int idx);
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_xcv.c 
b/drivers/net/ethernet/cavium/thunder/thunder_xcv.c
index 578c7f8..7e0c4cb 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_xcv.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_xcv.c
@@ -65,7 +65,7 @@ MODULE_LICENSE("GPL v2");
 MODULE_VERSION(DRV_VERSION);
 MODULE_DEVICE_TABLE(pci, xcv_id_table);
 
-void xcv_init_hw(void)
+void xcv_init_hw(int phy_mode)
 {
u64  cfg;
 
@@ -81,12 +81,31 @@ void xcv_init_hw(void)
/* Wait for DLL to lock */
msleep(1);
 
-   /* Configure DLL - enable or bypass
-* TX no bypass, RX bypass
-*/
+   /* enable/bypass DLL providing MAC based internal TX/RX delays */
cfg = readq_relaxed(xcv->reg_base + XCV_DLL_CTL);
-   cfg &= ~0xFF03;
-   cfg |= CLKRX_BYP;
+   cfg &= ~0x00;
+   switch (phy_mode) {
+   /* RX and TX delays are added by the MAC */
+   case PHY_INTERFACE_MODE_RGMII:
+   break;
+   /* internal RX and TX delays provided by the PHY */
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   cfg |= CLKRX_BYP;
+   cfg |= CLKTX_BYP;
+   break;
+

Re: [PATCH v3 0/3] Bluetooth: hci_ll: Get BD address from NVMEM

2017-12-12 Thread Marcel Holtmann

Hi David,

> This series adds supporting getting the BD address from a NVMEM provider
> for "LL" HCI controllers (Texas Instruments).
> 
> v3 changes:
> * Additional comments on why swapping bytes is needed.
> * Fixed comment style and trailing whitespace.
> * Rework error handling for nvmem cell code.
> 
> v2 changes:
> * Fixed typos in dt-bindings
> * Use "bd-address" instead of "mac-address"
> * Updated dt-bindings to specify the byte order of "bd-address"
> * New patch "Bluetooth: hci_ll: add support for setting public address"
> * Dropped patch "Bluetooth: hci_ll: add constant for vendor-specific command"
>  that is already in bluetooth-next
> * Rework error handling
> * Use bdaddr_t, bacmp and other bluetooth utils
> 
> David Lechner (3):
>  Bluetooth: hci_ll: add support for setting public address
>  dt-bindings: Add optional nvmem BD address bindings to ti,wlink-st
>  Bluetooth: hci_ll: Add optional nvmem BD address source
> 
> .../devicetree/bindings/net/ti,wilink-st.txt   |  5 ++
> drivers/bluetooth/hci_ll.c | 77 ++
> 2 files changed, 82 insertions(+)

I applied to first 2 patches to bluetooth-next tree, but the 3rd is throwing a 
warning.

  CC  drivers/bluetooth/hci_ll.o
drivers/bluetooth/hci_ll.c: In function ‘hci_ti_probe’:
drivers/bluetooth/hci_ll.c:814:41: error: passing argument 2 of 
‘nvmem_cell_read’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
   bdaddr = nvmem_cell_read(bdaddr_cell, );
 ^
In file included from drivers/bluetooth/hci_ll.c:56:0:
./include/linux/nvmem-consumer.h:81:21: note: expected ‘size_t * {aka long 
unsigned int *}’ but argument is of type ‘int *’
 static inline void *nvmem_cell_read(struct nvmem_cell *cell, size_t *len)
 ^~~

Regards

Marcel

thunderx sgmii interface hang

2017-12-12 Thread Tim Harvey

Greetings,

We are experiencing an issue on a CN80XX with an SGMII interface
coupled to a TI DP83867IS phy. We have the same PHY connected to the
RGMII interface on the same board design and everything is working as
expected on that nic both before and after triggering the hang.

The nic appears to work fine (pings, TCP etc) up until a performance
test is attempted.
When an iperf bandwidth test is attempted the nic ends up in a state
where truncated-ip packets are being sent out (per a tcpdump from
another board):

2016-02-11 16:40:23.996660 IP truncated-ip - 1454 bytes missing! (tos
0x0, ttl 64, id 39570, offset 0, flags [DF], proto TCP (6), length
1500, bad cksum 172a (->7033)!)
192.168.1.5.0 > 192.168.168.0.0:  tcp 1480 [bad hdr length 0 - too
short, < 20]

Prior to 'net: thunderx: Fix BGX transmit stall due to underflow'
unplugging the cable and re-plugging would resolve the issue to a
point where it could no longer be created. Prior to this patch a link
status change would disable and re-enable the BGX so perhaps that
helps shed some light on what's going on.

I'm using 4.14.4 with the following patches (although the issue
existed with 4.14.0 as well):
2615c91 net: thunderx: Fix TCP/UDP checksum offload for IPv4 pkts
763d8b3 net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts
93b2e67 net: thunderx: fix double free error

The issue persists regardless of the DP83867 PHY driver being in the
kernel. Any ideas or details that can help troubleshoot this?

Best Regards,

Tim

Re: [PATCH v9 0/5] Add the ability to do BPF directed error injection

2017-12-12 Thread Darrick J. Wong

On Mon, Dec 11, 2017 at 11:36:45AM -0500, Josef Bacik wrote:
> This is the same as v8, just rebased onto the bpf tree.
> 
> v8->v9:
> - rebased onto the bpf tree.
> 
> v7->v8:
> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
> 
> v6->v7:
> - moved the opt-in macro to bpf.h out of kprobes.h.
> 
> v5->v6:
> - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
>   feature.  This way only functions that opt-in will be allowed to be
>   overridden.
> - added a btrfs patch to allow error injection for open_ctree() so that the 
> bpf
>   sample actually works.
> 
> v4->v5:
> - disallow kprobe_override programs from being put in the prog map array so we
>   don't tail call into something we didn't check.  This allows us to make the
>   normal path still fast without a bunch of percpu operations.
> 
> v3->v4:
> - fix a build error found by kbuild test bot (I didn't wait long enough
>   apparently.)
> - Added a warning message as per Daniels suggestion.
> 
> v2->v3:
> - added a ->kprobe_override flag to bpf_prog.
> - added some sanity checks to disallow attaching bpf progs that have
>   ->kprobe_override set that aren't for ftrace kprobes.
> - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
>   ftrace kprobe.
> - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read 
> this
>   value in the kprobe path, and thus only write to it if we're overriding or
>   clearing the override.
> 
> v1->v2:
> - moved things around to make sure that bpf_override_return could really only 
> be
>   used for an ftrace kprobe.
> - killed the special return values from trace_call_bpf.
> - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
>   it was being called from an ftrace kprobe context.
> - reworked the logic in kprobe_perf_func to take advantage of 
> bpf_kprobe_state.
> - updated the test as per Alexei's review.
> 
> - Original message -
> 
> A lot of our error paths are not well tested because we have no good way of
> injecting errors generically.  Some subystems (block, memory) have ways to
> inject errors, but they are random so it's hard to get reproduceable results.
> 
> With BPF we can add determinism to our error injection.  We can use kprobes 
> and
> other things to verify we are injecting errors at the exact case we are trying
> to test.  This patch gives us the tool to actual do the error injection part.
> It is very simple, we just set the return value of the pt_regs we're given to
> whatever we provide, and then override the PC with a dummy function that 
> simply
> returns.

Heh, this looks cool.  I decided to try it to see what happens, and saw
a bunch of dmesg pasted in below.  Is that supposed to happen?  Or am I
the only fs developer still running with lockdep enabled? :)

It looks like bpf_override_return has some sort of side effect such that
we get the splat, since commenting it out makes the symptom go away.



--D

[ 1847.769183] BTRFS error (device (null)): open_ctree failed
[ 1847.770130] BUG: sleeping function called from invalid context at 
/storage/home/djwong/cdev/work/linux-xfs/kernel/locking/rwsem.c:69
[ 1847.771976] in_atomic(): 1, irqs_disabled(): 0, pid: 1524, name: mount
[ 1847.773016] 1 lock held by mount/1524:
[ 1847.773530]  #0:  (>s_umount_key#34/1){+.+.}, at: [<653a9bb4>] 
sget_userns+0x302/0x4f0
[ 1847.774731] Preemption disabled at:
[ 1847.774735] [<  (null)>]   (null)
[ 1847.777009] CPU: 2 PID: 1524 Comm: mount Tainted: GW
4.15.0-rc3-xfsx #3
[ 1847.778800] Call Trace:
[ 1847.779047]  dump_stack+0x7c/0xbe
[ 1847.779361]  ___might_sleep+0x1f7/0x260
[ 1847.779720]  down_write+0x29/0xb0
[ 1847.780046]  unregister_shrinker+0x15/0x70
[ 1847.780427]  deactivate_locked_super+0x2e/0x60
[ 1847.780935]  btrfs_mount+0xbb6/0x1000 [btrfs]
[ 1847.781353]  ? __lockdep_init_map+0x5c/0x1d0
[ 1847.781750]  ? mount_fs+0xf/0x80
[ 1847.782065]  ? alloc_vfsmnt+0x1a1/0x230
[ 1847.782429]  mount_fs+0xf/0x80
[ 1847.782733]  vfs_kern_mount+0x62/0x160
[ 1847.783128]  btrfs_mount+0x3d3/0x1000 [btrfs]
[ 1847.783493]  ? __lockdep_init_map+0x5c/0x1d0
[ 1847.783849]  ? __lockdep_init_map+0x5c/0x1d0
[ 1847.784207]  ? mount_fs+0xf/0x80
[ 1847.784502]  mount_fs+0xf/0x80
[ 1847.784835]  vfs_kern_mount+0x62/0x160
[ 1847.785235]  do_mount+0x1b1/0xd50
[ 1847.785594]  ? _copy_from_user+0x5b/0x90
[ 1847.786028]  ? memdup_user+0x4b/0x70
[ 1847.786501]  SyS_mount+0x85/0xd0
[ 1847.786835]  entry_SYSCALL_64_fastpath+0x1f/0x96
[ 1847.787311] RIP: 0033:0x7f6ebecc1b5a
[ 1847.787691] RSP: 002b:7ffc7bd1c958 EFLAGS: 0202 ORIG_RAX: 
00a5
[ 1847.788383] RAX: ffda RBX: 7f6ebefba63a RCX: 7f6ebecc1b5a
[ 1847.789106] RDX: 00bfd010 RSI: 00bfa230 RDI: 00bfa210
[ 1847.789807] RBP: 00bfa0f0 R08:  R09: 0014
[ 1847.790511] R10: c0ed R11: 0202 R12: 7f6ebf1ca83c
[

Re: [PATCH net] bpf: add schedule points to map alloc/free

2017-12-12 Thread Alexei Starovoitov

On Tue, Dec 12, 2017 at 02:22:39PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> While using large percpu maps, htab_map_alloc() can hold
> cpu for hundreds of ms.
> 
> This patch adds cond_resched() calls to percpu alloc/free
> call sites, all running in process context.
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric!

[PATCH v2 next-queue 01/10] ixgbe: clean up ipsec defines

2017-12-12 Thread Shannon Nelson

Clean up the ipsec/macsec descriptor bit definitions to match the rest
of the defines and file organization.  Also recognise the bit-definition
overlap in the error mask macro.

Signed-off-by: Shannon Nelson 
---
v2: no changes

 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index ffa0ee5..3df0763 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -2321,11 +2321,6 @@ enum {
 #define IXGBE_TXD_CMD_VLE0x4000 /* Add VLAN tag */
 #define IXGBE_TXD_STAT_DD0x0001 /* Descriptor Done */
 
-#define IXGBE_RXDADV_IPSEC_STATUS_SECP  0x0002
-#define IXGBE_RXDADV_IPSEC_ERROR_INVALID_PROTOCOL   0x0800
-#define IXGBE_RXDADV_IPSEC_ERROR_INVALID_LENGTH 0x1000
-#define IXGBE_RXDADV_IPSEC_ERROR_AUTH_FAILED0x1800
-#define IXGBE_RXDADV_IPSEC_ERROR_BIT_MASK   0x1800
 /* Multiple Transmit Queue Command Register */
 #define IXGBE_MTQC_RT_ENA   0x1 /* DCB Enable */
 #define IXGBE_MTQC_VT_ENA   0x2 /* VMDQ2 Enable */
@@ -2377,6 +2372,9 @@ enum {
 #define IXGBE_RXDADV_ERR_LE 0x0200 /* Length Error */
 #define IXGBE_RXDADV_ERR_PE 0x0800 /* Packet Error */
 #define IXGBE_RXDADV_ERR_OSE0x1000 /* Oversize Error */
+#define IXGBE_RXDADV_ERR_IPSEC_INV_PROTOCOL  0x0800 /* overlap ERR_PE  */
+#define IXGBE_RXDADV_ERR_IPSEC_INV_LENGTH0x1000 /* overlap ERR_OSE */
+#define IXGBE_RXDADV_ERR_IPSEC_AUTH_FAILED   0x1800
 #define IXGBE_RXDADV_ERR_USE0x2000 /* Undersize Error */
 #define IXGBE_RXDADV_ERR_TCPE   0x4000 /* TCP/UDP Checksum Error */
 #define IXGBE_RXDADV_ERR_IPE0x8000 /* IP Checksum Error */
@@ -2398,6 +2396,7 @@ enum {
 #define IXGBE_RXDADV_STAT_FCSTAT_FCPRSP 0x0020 /* 10: Recv. FCP_RSP */
 #define IXGBE_RXDADV_STAT_FCSTAT_DDP0x0030 /* 11: Ctxt w/ DDP */
 #define IXGBE_RXDADV_STAT_TS   0x0001 /* IEEE 1588 Time Stamp */
+#define IXGBE_RXDADV_STAT_SECP  0x0002 /* IPsec/MACsec pkt found */
 
 /* PSRTYPE bit definitions */
 #define IXGBE_PSRTYPE_TCPHDR0x0010
@@ -2464,13 +2463,6 @@ enum {
 #define IXGBE_RXDADV_PKTTYPE_ETQF_MASK  0x0070 /* ETQF has 8 indices */
 #define IXGBE_RXDADV_PKTTYPE_ETQF_SHIFT 4  /* Right-shift 4 bits */
 
-/* Security Processing bit Indication */
-#define IXGBE_RXDADV_LNKSEC_STATUS_SECP 0x0002
-#define IXGBE_RXDADV_LNKSEC_ERROR_NO_SA_MATCH   0x0800
-#define IXGBE_RXDADV_LNKSEC_ERROR_REPLAY_ERROR  0x1000
-#define IXGBE_RXDADV_LNKSEC_ERROR_BIT_MASK  0x1800
-#define IXGBE_RXDADV_LNKSEC_ERROR_BAD_SIG   0x1800
-
 /* Masks to determine if packets should be dropped due to frame errors */
 #define IXGBE_RXD_ERR_FRAME_ERR_MASK ( \
  IXGBE_RXD_ERR_CE | \
@@ -2484,6 +2476,8 @@ enum {
  IXGBE_RXDADV_ERR_LE | \
  IXGBE_RXDADV_ERR_PE | \
  IXGBE_RXDADV_ERR_OSE | \
+ IXGBE_RXDADV_ERR_IPSEC_INV_PROTOCOL | \
+ IXGBE_RXDADV_ERR_IPSEC_INV_LENGTH | \
  IXGBE_RXDADV_ERR_USE)
 
 /* Multicast bit mask */
@@ -2893,6 +2887,7 @@ struct ixgbe_adv_tx_context_desc {
 IXGBE_ADVTXD_POPTS_SHIFT)
 #define IXGBE_ADVTXD_POPTS_TXSM (IXGBE_TXD_POPTS_TXSM << \
 IXGBE_ADVTXD_POPTS_SHIFT)
+#define IXGBE_ADVTXD_POPTS_IPSEC 0x0400 /* IPSec offload request */
 #define IXGBE_ADVTXD_POPTS_ISCO_1ST  0x /* 1st TSO of iSCSI PDU */
 #define IXGBE_ADVTXD_POPTS_ISCO_MDL  0x0800 /* Middle TSO of iSCSI PDU */
 #define IXGBE_ADVTXD_POPTS_ISCO_LAST 0x1000 /* Last TSO of iSCSI PDU */
@@ -2908,7 +2903,6 @@ struct ixgbe_adv_tx_context_desc {
 #define IXGBE_ADVTXD_TUCMD_L4T_SCTP  0x1000  /* L4 Packet TYPE of SCTP */
 #define IXGBE_ADVTXD_TUCMD_L4T_RSV 0x1800 /* RSV L4 Packet TYPE */
 #define IXGBE_ADVTXD_TUCMD_MKRREQ0x2000 /*Req requires Markers and 
CRC*/
-#define IXGBE_ADVTXD_POPTS_IPSEC  0x0400 /* IPSec offload request */
 #define IXGBE_ADVTXD_TUCMD_IPSEC_TYPE_ESP 0x2000 /* IPSec Type ESP */
 #define IXGBE_ADVTXD_TUCMD_IPSEC_ENCRYPT_EN 0x4000/* ESP Encrypt Enable */
 #define IXGBE_ADVTXT_TUCMD_FCOE  0x8000   /* FCoE Frame Type */
-- 
2.7.4

[PATCH v2 next-queue 02/10] ixgbe: add ipsec register access routines

2017-12-12 Thread Shannon Nelson

Add a few routines to make access to the ipsec registers just a little
easier, and throw in the beginnings of an initialization.

Signed-off-by: Shannon Nelson 
---
v2: Rx table selector becomes an enum with a shift
Combine the clear table loops into one
Name the table index shift value
Use the addr as __be32

 drivers/net/ethernet/intel/ixgbe/Makefile  |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |   6 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 161 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h |  52 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   1 +
 5 files changed, 221 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile 
b/drivers/net/ethernet/intel/ixgbe/Makefile
index 35e6fa6..8319465 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -42,3 +42,4 @@ ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
 ixgbe-$(CONFIG_IXGBE_HWMON) += ixgbe_sysfs.o
 ixgbe-$(CONFIG_DEBUG_FS) += ixgbe_debugfs.o
 ixgbe-$(CONFIG_FCOE:m=y) += ixgbe_fcoe.o
+ixgbe-$(CONFIG_XFRM_OFFLOAD) += ixgbe_ipsec.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dd55787..1e11462 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -52,6 +52,7 @@
 #ifdef CONFIG_IXGBE_DCA
 #include 
 #endif
+#include "ixgbe_ipsec.h"
 
 #include 
 
@@ -1001,4 +1002,9 @@ void ixgbe_store_key(struct ixgbe_adapter *adapter);
 void ixgbe_store_reta(struct ixgbe_adapter *adapter);
 s32 ixgbe_negotiate_fc(struct ixgbe_hw *hw, u32 adv_reg, u32 lp_reg,
   u32 adv_sym, u32 adv_asm, u32 lp_sym, u32 lp_asm);
+#ifdef CONFIG_XFRM_OFFLOAD
+void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter);
+#else
+static inline void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { };
+#endif /* CONFIG_XFRM_OFFLOAD */
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
new file mode 100644
index 000..4d71517
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -0,0 +1,161 @@
+/***
+ *
+ * Intel 10 Gigabit PCI Express Linux driver
+ * Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * Linux NICS 
+ * e1000-devel Mailing List 
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ 
**/
+
+#include "ixgbe.h"
+
+/**
+ * ixgbe_ipsec_set_tx_sa - set the Tx SA registers
+ * @hw: hw specific details
+ * @idx: register index to write
+ * @key: key byte array
+ * @salt: salt bytes
+ **/
+static void ixgbe_ipsec_set_tx_sa(struct ixgbe_hw *hw, u16 idx,
+ u32 key[], u32 salt)
+{
+   u32 reg;
+   int i;
+
+   for (i = 0; i < 4; i++)
+   IXGBE_WRITE_REG(hw, IXGBE_IPSTXKEY(i), cpu_to_be32(key[3-i]));
+   IXGBE_WRITE_REG(hw, IXGBE_IPSTXSALT, cpu_to_be32(salt));
+   IXGBE_WRITE_FLUSH(hw);
+
+   reg = IXGBE_READ_REG(hw, IXGBE_IPSTXIDX);
+   reg &= IXGBE_RXTXIDX_IPS_EN;
+   reg |= idx << IXGBE_RXTXIDX_IDX_SHIFT | IXGBE_RXTXIDX_WRITE;
+   IXGBE_WRITE_REG(hw, IXGBE_IPSTXIDX, reg);
+   IXGBE_WRITE_FLUSH(hw);
+}
+
+/**
+ * ixgbe_ipsec_set_rx_item - set an Rx table item
+ * @hw: hw specific details
+ * @idx: register index to write
+ * @tbl: table selector
+ *
+ * Trigger the device to store into a particular Rx table the
+ * data that has already been loaded into the input register
+ **/
+static void ixgbe_ipsec_set_rx_item(struct ixgbe_hw *hw, u16 idx,
+   enum ixgbe_ipsec_tbl_sel tbl)
+{
+   u32 reg;
+
+   reg = IXGBE_READ_REG(hw, IXGBE_IPSRXIDX);
+   reg &= IXGBE_RXTXIDX_IPS_EN;
+   reg |= tbl << IXGBE_RXIDX_TBL_SHIFT |
+

[PATCH v2 next-queue 07/10] ixgbe: process the Rx ipsec offload

2017-12-12 Thread Shannon Nelson

If the chip sees and decrypts an ipsec offload, set up the skb
sp pointer with the ralated SA info.  Since the chip is rude
enough to keep to itself the table index it used for the
decryption, we have to do our own table lookup, using the
hash for speed.

Signed-off-by: Shannon Nelson 
---
v2: no changes

 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  6 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 89 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  3 +
 3 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index af690c2..a094b23 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1011,9 +1011,15 @@ s32 ixgbe_negotiate_fc(struct ixgbe_hw *hw, u32 adv_reg, 
u32 lp_reg,
 void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter);
 void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter);
 void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter);
+void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
+   union ixgbe_adv_rx_desc *rx_desc,
+   struct sk_buff *skb);
 #else
 static inline void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { };
 static inline void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter) { };
 static inline void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter) { };
+static inline void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff *skb) { };
 #endif /* CONFIG_XFRM_OFFLOAD */
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 049c195..7e421b8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -374,6 +374,35 @@ static int ixgbe_ipsec_find_empty_idx(struct ixgbe_ipsec 
*ipsec, bool rxtable)
 }
 
 /**
+ * ixgbe_ipsec_find_rx_state - find the state that matches
+ * @ipsec: pointer to ipsec struct
+ * @daddr: inbound address to match
+ * @proto: protocol to match
+ * @spi: SPI to match
+ *
+ * Returns a pointer to the matching SA state information
+ **/
+static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
+   __be32 daddr, u8 proto,
+   __be32 spi)
+{
+   struct rx_sa *rsa;
+   struct xfrm_state *ret = NULL;
+
+   rcu_read_lock();
+   hash_for_each_possible_rcu(ipsec->rx_sa_list, rsa, hlist, spi)
+   if (spi == rsa->xs->id.spi &&
+   daddr == rsa->xs->id.daddr.a4 &&
+   proto == rsa->xs->id.proto) {
+   ret = rsa->xs;
+   xfrm_state_hold(ret);
+   break;
+   }
+   rcu_read_unlock();
+   return ret;
+}
+
+/**
  * ixgbe_ipsec_parse_proto_keys - find the key and salt based on the protocol
  * @xs: pointer to xfrm_state struct
  * @mykey: pointer to key array to populate
@@ -672,6 +701,66 @@ static const struct xfrmdev_ops ixgbe_xfrmdev_ops = {
 };
 
 /**
+ * ixgbe_ipsec_rx - decode ipsec bits from Rx descriptor
+ * @rx_ring: receiving ring
+ * @rx_desc: receive data descriptor
+ * @skb: current data packet
+ *
+ * Determine if there was an ipsec encapsulation noticed, and if so set up
+ * the resulting status for later in the receive stack.
+ **/
+void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
+   union ixgbe_adv_rx_desc *rx_desc,
+   struct sk_buff *skb)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(rx_ring->netdev);
+   u16 pkt_info = le16_to_cpu(rx_desc->wb.lower.lo_dword.hs_rss.pkt_info);
+   u16 ipsec_pkt_types = IXGBE_RXDADV_PKTTYPE_IPSEC_AH |
+   IXGBE_RXDADV_PKTTYPE_IPSEC_ESP;
+   struct ixgbe_ipsec *ipsec = adapter->ipsec;
+   struct xfrm_offload *xo = NULL;
+   struct xfrm_state *xs = NULL;
+   struct iphdr *iph;
+   u8 *c_hdr;
+   __be32 spi;
+   u8 proto;
+
+   /* we can assume no vlan header in the way, b/c the
+* hw won't recognize the IPsec packet and anyway the
+* currently vlan device doesn't support xfrm offload.
+*/
+   /* TODO: not supporting IPv6 yet */
+   iph = (struct iphdr *)(skb->data + ETH_HLEN);
+   c_hdr = (u8 *)iph + iph->ihl * 4;
+   switch (pkt_info & ipsec_pkt_types) {
+   case IXGBE_RXDADV_PKTTYPE_IPSEC_AH:
+   spi = ((struct ip_auth_hdr *)c_hdr)->spi;
+   proto = IPPROTO_AH;
+   break;
+   case IXGBE_RXDADV_PKTTYPE_IPSEC_ESP:
+   spi = ((struct ip_esp_hdr *)c_hdr)->spi;
+   proto = IPPROTO_ESP;
+   break;
+   default:
+   return;
+   }
+
+   xs =

[PATCH v2 next-queue 00/10] ixgbe: Add ipsec offload

2017-12-12 Thread Shannon Nelson

This is an implementation of the ipsec hardware offload feature for
the ixgbe driver and Intel's 10Gbe series NICs: x540, x550, 82599.
These patches apply to net-next v4.14 as well as Jeff Kirsher's next-queue
v4.15-rc1-206-ge47375b.

The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security
Associations (SAs), using up to 128 inbound IP addresses, and using the
rfc4106(gcm(aes)) encryption.  This code does not yet support IPv6,
checksum offload, or TSO in conjunction with the ipsec offload - those
will be added in the future.

This code shows improvements in both packet throughput and CPU utilization.
For example, here are some quicky numbers that show the magnitude of the
performance gain on a single run of "iperf -c " with the ipsec
offload on both ends of a point-to-point connection:

9.4 Gbps - normal case
7.6 Gbps - ipsec with offload
343 Mbps - ipsec no offload

To set up a similar test case, you first need to be sure you have a recent
version of iproute2 that supports the ipsec offload tag, probably something
from ip 4.12 or newer would be best.  I have a shell script that builds
up the appropriate commands for me, but here are the resulting commands
for all tcp traffic between 14.0.0.52 and 14.0.0.70:

For the left side (14.0.0.52):
  ip x p add dir out src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp tmpl \
 proto esp src 14.0.0.52 dst 14.0.0.70 spi 0x07 mode transport reqid 0x07
  ip x p add dir in src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp tmpl \
 proto esp dst 14.0.0.52 src 14.0.0.70 spi 0x07 mode transport reqid 0x07
  ip x s add proto esp src 14.0.0.52 dst 14.0.0.70 spi 0x07 mode transport \
 reqid 0x07 replay-window 32 \
 aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
 sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload dev eth4 dir out
  ip x s add proto esp dst 14.0.0.52 src 14.0.0.70 spi 0x07 mode transport \
 reqid 0x07 replay-window 32 \
 aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
 sel src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp offload dev eth4 dir in
 
For the right side (14.0.0.70):
  ip x p add dir out src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp tmpl \
 proto esp src 14.0.0.70 dst 14.0.0.52 spi 0x07 mode transport reqid 0x07
  ip x p add dir in src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp tmpl \
 proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport reqid 0x07
  ip x s add proto esp src 14.0.0.70 dst 14.0.0.52 spi 0x07 mode transport \
 reqid 0x07 replay-window 32 \
 aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
 sel src 14.0.0.70/24 dst 14.0.0.52/24 proto tcp offload dev eth4 dir out
  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
 reqid 0x07 replay-window 32 \
 aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
 sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload dev eth4 dir in

In both cases, the command "ip x s flush ; ip x p flush" will clean
it all out and remove the offloads.

Lastly, thanks to Alex Duyck for his early comments.

Please see the individual patches for version update info.

Shannon Nelson (10):
  ixgbe: clean up ipsec defines
  ixgbe: add ipsec register access routines
  ixgbe: add ipsec engine start and stop routines
  ixgbe: add ipsec data structures
  ixgbe: add ipsec offload add and remove SA
  ixgbe: restore offloaded SAs after a reset
  ixgbe: process the Rx ipsec offload
  ixgbe: process the Tx ipsec offload
  ixgbe: ipsec offload stats
  ixgbe: register ipsec offload with the xfrm subsystem

 drivers/net/ethernet/intel/ixgbe/Makefile|   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |  33 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c   | 923 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h   |  92 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  39 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h|  22 +-
 8 files changed, 1093 insertions(+), 23 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h

-- 
2.7.4

[PATCH v2 next-queue 06/10] ixgbe: restore offloaded SAs after a reset

2017-12-12 Thread Shannon Nelson

On a chip reset most of the table contents are lost, so must be
restored.  This scans the driver's ipsec tables and restores both
the filled and empty table slots to their pre-reset values.

Signed-off-by: Shannon Nelson 
---
v2: during restore, clean the tables before restarting

 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 41 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  1 +
 3 files changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 8f41508..af690c2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1010,8 +1010,10 @@ s32 ixgbe_negotiate_fc(struct ixgbe_hw *hw, u32 adv_reg, 
u32 lp_reg,
 #ifdef CONFIG_XFRM_OFFLOAD
 void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter);
 void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter);
+void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter);
 #else
 static inline void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { };
 static inline void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter) { };
+static inline void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter) { };
 #endif /* CONFIG_XFRM_OFFLOAD */
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 72b1d29..049c195 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -299,6 +299,47 @@ static void ixgbe_ipsec_start_engine(struct ixgbe_adapter 
*adapter)
 }
 
 /**
+ * ixgbe_ipsec_restore - restore the ipsec HW settings after a reset
+ * @adapter: board private structure
+ **/
+void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
+{
+   struct ixgbe_ipsec *ipsec = adapter->ipsec;
+   struct ixgbe_hw *hw = >hw;
+   int i;
+
+   if (!(adapter->flags2 & IXGBE_FLAG2_IPSEC_ENABLED))
+   return;
+
+   /* clean up and restart the engine */
+   ixgbe_ipsec_stop_engine(adapter);
+   ixgbe_ipsec_clear_hw_tables(adapter);
+   ixgbe_ipsec_start_engine(adapter);
+
+   /* reload the IP addrs */
+   for (i = 0; i < IXGBE_IPSEC_MAX_RX_IP_COUNT; i++) {
+   struct rx_ip_sa *ipsa = >ip_tbl[i];
+
+   if (ipsa->used)
+   ixgbe_ipsec_set_rx_ip(hw, i, ipsa->ipaddr);
+   }
+
+   /* reload the Rx and Tx keys */
+   for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT; i++) {
+   struct rx_sa *rsa = >rx_tbl[i];
+   struct tx_sa *tsa = >tx_tbl[i];
+
+   if (rsa->used)
+   ixgbe_ipsec_set_rx_sa(hw, i, rsa->xs->id.spi,
+ rsa->key, rsa->salt,
+ rsa->mode, rsa->iptbl_ind);
+
+   if (tsa->used)
+   ixgbe_ipsec_set_tx_sa(hw, i, tsa->key, tsa->salt);
+   }
+}
+
+/**
  * ixgbe_ipsec_find_empty_idx - find the first unused security parameter index
  * @ipsec: pointer to ipsec struct
  * @rxtable: true if we need to look in the Rx table
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2b3da0c..04e8b26 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5347,6 +5347,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 
ixgbe_set_rx_mode(adapter->netdev);
ixgbe_restore_vlan(adapter);
+   ixgbe_ipsec_restore(adapter);
 
switch (hw->mac.type) {
case ixgbe_mac_82599EB:
-- 
2.7.4

[PATCH v2 next-queue 09/10] ixgbe: ipsec offload stats

2017-12-12 Thread Shannon Nelson

Add a simple statistic to count the ipsec offloads.

Signed-off-by: Shannon Nelson 
---
v2: change per ring counter to adapter rx and tx counters
move tx_ipsec count to the tx clean code

 drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c   | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 5 -
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 3d2b7bf..1dfe147 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -629,10 +629,12 @@ struct ixgbe_adapter {
int num_tx_queues;
u16 tx_itr_setting;
u16 tx_work_limit;
+   u64 tx_ipsec;
 
/* Rx fast path data */
int num_rx_queues;
u16 rx_itr_setting;
+   u64 rx_ipsec;
 
/* Port number used to identify VXLAN traffic */
__be16 vxlan_port;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index c3e7a81..bcf011e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -114,6 +114,8 @@ static const struct ixgbe_stats ixgbe_gstrings_stats[] = {
{"tx_hwtstamp_timeouts", IXGBE_STAT(tx_hwtstamp_timeouts)},
{"tx_hwtstamp_skipped", IXGBE_STAT(tx_hwtstamp_skipped)},
{"rx_hwtstamp_cleared", IXGBE_STAT(rx_hwtstamp_cleared)},
+   {"tx_ipsec", IXGBE_STAT(tx_ipsec)},
+   {"rx_ipsec", IXGBE_STAT(rx_ipsec)},
 #ifdef IXGBE_FCOE
{"fcoe_bad_fccrc", IXGBE_STAT(stats.fccrc)},
{"rx_fcoe_dropped", IXGBE_STAT(stats.fcoerpdc)},
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 5ed8a4f..ed3a4c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -837,6 +837,8 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = CRYPTO_SUCCESS;
+
+   adapter->rx_ipsec++;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 84fbfb9..814268f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1173,7 +1173,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector 
*q_vector,
struct ixgbe_adapter *adapter = q_vector->adapter;
struct ixgbe_tx_buffer *tx_buffer;
union ixgbe_adv_tx_desc *tx_desc;
-   unsigned int total_bytes = 0, total_packets = 0;
+   unsigned int total_bytes = 0, total_packets = 0, total_ipsec = 0;
unsigned int budget = q_vector->tx.work_limit;
unsigned int i = tx_ring->next_to_clean;
 
@@ -1204,6 +1204,8 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector 
*q_vector,
/* update the statistics for this packet */
total_bytes += tx_buffer->bytecount;
total_packets += tx_buffer->gso_segs;
+   if (tx_buffer->tx_flags & IXGBE_TX_FLAGS_IPSEC)
+   total_ipsec++;
 
/* free the skb */
if (ring_is_xdp(tx_ring))
@@ -1266,6 +1268,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector 
*q_vector,
u64_stats_update_end(_ring->syncp);
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
+   adapter->tx_ipsec += total_ipsec;
 
if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
/* schedule immediate reset if we believe we hung */
-- 
2.7.4

[PATCH v2 next-queue 10/10] ixgbe: register ipsec offload with the xfrm subsystem

2017-12-12 Thread Shannon Nelson

With all the support code in place we can now link in the ipsec
offload operations and set the ESP feature flag for the XFRM
subsystem to see.

Signed-off-by: Shannon Nelson 
---
v2: added the xdo_dev_state_free callback to make XFRM happy
changed use of NETIF_F_HW_CSUM_BIT to NETIF_F_HW_CSUM

 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 17 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  4 
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index ed3a4c8..4949ea9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -715,10 +715,23 @@ static bool ixgbe_ipsec_offload_ok(struct sk_buff *skb, 
struct xfrm_state *xs)
return true;
 }
 
+/**
+ * ixgbe_ipsec_free - called by xfrm garbage collections
+ * @xs: pointer to transformer state struct
+ *
+ * We don't have any garbage to collect, so we shouldn't bother
+ * implementing this function, but the XFRM code doesn't check for
+ * existence before calling the API callback.
+ **/
+static void ixgbe_ipsec_free(struct xfrm_state *xs)
+{
+}
+
 static const struct xfrmdev_ops ixgbe_xfrmdev_ops = {
.xdo_dev_state_add = ixgbe_ipsec_add_sa,
.xdo_dev_state_delete = ixgbe_ipsec_del_sa,
.xdo_dev_offload_ok = ixgbe_ipsec_offload_ok,
+   .xdo_dev_state_free = ixgbe_ipsec_free,
 };
 
 /**
@@ -877,6 +890,10 @@ void ixgbe_init_ipsec_offload(struct ixgbe_adapter 
*adapter)
ixgbe_ipsec_stop_engine(adapter);
ixgbe_ipsec_clear_hw_tables(adapter);
 
+   adapter->netdev->xfrmdev_ops = _xfrmdev_ops;
+   adapter->netdev->features |= NETIF_F_HW_ESP;
+   adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
+
return;
 
 err2:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 814268f..2e5d0a2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9796,6 +9796,10 @@ ixgbe_features_check(struct sk_buff *skb, struct 
net_device *dev,
if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID))
features &= ~NETIF_F_TSO;
 
+   /* IPsec offload doesn't get along well with others *yet* */
+   if (skb->sp)
+   features &= ~(NETIF_F_TSO | NETIF_F_HW_CSUM);
+
return features;
 }
 
-- 
2.7.4

1 2 3 >

1 - 100 of 243 matches

Mail list logo