[PATCH net-next 0/9] Hisilicon Network Subsystem 3 Ethernet Driver

2017-06-09 Thread Salil Mehta
This patch-set contains the support of the HNS3 (Hisilicon Network Subsystem 3)
Ethernet driver for hip08 family of SoCs and future upcoming SoCs.

Hisilicon's new hip08 SoCs have integrated ethernet based on PCI Express and
hence there was a need of new driver over the previous HNS driver which is 
already part of the Linux mainline. This new driver is NOT backward
compatible with HNS.

This current driver is meant to control the Physical Function and there would
soon be a support of a separate driver for Virtual Function once this base PF
driver has been accepted. Also, this driver is the ongoing development work and
HNS3 Ethernet driver would be incrementally enhanced with more new features.

High Level Architecture:

[ Ethtool ]
   ^  |
   |  | 
   [Ethernet Client]  [RoCE Client] . . . [ Ethernet Client ] 
- |
 ||
 [ HNAE3 Framework (Register/unregister) ]|
 ||
- |
   [ HNAE Device ]|
 ||
   [ HCLGE Layer] |
 |_   |
|| |  |
[ MDIO ][ Scheduler/Shaper ]  [ Debugfs ] |
|| |  |
||_|  | 
 ||
 [ IMP command Interface ]|
- |
  HIP08  H A R D W A R E  *


Current patch-set broadly adds the support of the following PF functionality:
 1. Basic Rx and Tx functionality 
 2. TSO support
 3. Ethtool support
 4. Debugfs support 
 5. HNAE framework and hardware compatability layer
 6. Scheduler and Shaper support in transmit function
 7. MDIO support

Salil Mehta (9):
  net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC
  net: hns3: Add support of the HNAE3 framework
  net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support
  net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support
  net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver
  net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC
  net: hns3: Add Ethtool support to HNS3 driver
  net: hns3: Add support of debugfs interface to HNS3 driver
  net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS

 MAINTAINERS|8 +
 drivers/net/ethernet/hisilicon/Kconfig |   24 +
 drivers/net/ethernet/hisilicon/Makefile|1 +
 drivers/net/ethernet/hisilicon/hns3/Makefile   |7 +
 drivers/net/ethernet/hisilicon/hns3/hnae3.c|  305 ++
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  449 +++
 .../net/ethernet/hisilicon/hns3/hns3pf/Makefile|   11 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  347 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  742 
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c |  188 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 4257 
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  493 +++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c|  310 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 1018 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |  108 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2851 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  585 +++
 .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  |  894 
 18 files changed, 12598 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c

-- 
2.7.4




[PATCH net-next 4/9] net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support

2017-06-09 Thread Salil Mehta
This patch adds the support of Hisilicon Network Subsystem Accceleration
Engine and common operations to access it. This layer provides access to the
hardware configuration, hardware statistics. This layer is also
responsible for triggering the initialization of the PHY layer through
the below MDIO layer.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 4257 
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  493 +++
 2 files changed, 4750 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
new file mode 100644
index 000..6771990
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -0,0 +1,4257 @@
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hclge_cmd.h"
+#include "hclge_main.h"
+#include "hclge_tm.h"
+#include "hnae3.h"
+
+#define HCLGE_NAME "hclge"
+#define HCLGE_STATS_READ(p, offset) (*((u64 *)((u8 *)(p) + (offset
+#define HCLGE_MAC_STATS_FIELD_OFF(f) (offsetof(struct hclge_mac_stats, f))
+#define HCLGE_64BIT_STATS_FIELD_OFF(f) (offsetof(struct hclge_64_bit_stats, f))
+#define HCLGE_32BIT_STATS_FIELD_OFF(f) (offsetof(struct hclge_32_bit_stats, f))
+
+static int hclge_rss_init_hw(struct hclge_dev *hdev);
+static int hclge_set_mta_filter_mode(struct hclge_dev *hdev,
+enum hclge_mta_dmac_sel_type mta_mac_sel,
+bool enable);
+static int hclge_init_vlan_config(struct hclge_dev *hdev);
+
+struct hnae3_ae_algo ae_algo;
+
+static const struct pci_device_id ae_algo_pci_tbl[] = {
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_GE), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
+   /* Required last entry */
+   {0, }
+};
+
+static const struct pci_device_id roce_pci_tbl[] = {
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
+   /* Required last entry */
+   {0, }
+};
+
+static const char hns3_nic_test_strs[][ETH_GSTRING_LEN] = {
+   "MacLoopback test",
+   "Serdes Loopback test",
+   "PhyLoopback test"
+};
+
+static const struct hclge_comm_stats_str g_all_64bit_stats_string[] = {
+   {"igu_rx_oversize_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_oversize_pkt)},
+   {"igu_rx_undersize_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_undersize_pkt)},
+   {"igu_rx_out_all_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_out_all_pkt)},
+   {"igu_rx_uni_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_uni_pkt)},
+   {"igu_rx_multi_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_multi_pkt)},
+   {"igu_rx_broad_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(igu_rx_broad_pkt)},
+   {"egu_tx_out_all_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(egu_tx_out_all_pkt)},
+   {"egu_tx_uni_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(egu_tx_uni_pkt)},
+   {"egu_tx_multi_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(egu_tx_multi_pkt)},
+   {"egu_tx_broad_pkt",
+   HCLGE_64BIT_STATS_FIELD_OFF(egu_tx_broad_pkt)},
+   {"ssu_ppp_mac_key_num",
+   HCLGE_64BIT_STATS_FIELD_OFF(ssu_ppp_mac_key_num)},
+   {"ssu_ppp_host_key_num",
+   HCLGE_64BIT_STATS_FIELD_OFF(ssu_ppp_host_key_num)},
+   {"ppp_ssu_mac_rlt_num",
+   HCLGE_64BIT_STATS_FIELD_OFF(ppp_ssu_mac_rlt_num)},
+   {"ppp_ssu_host_rlt_num",
+   HCLGE_64BIT_STATS_FIELD_OFF(ppp_ssu_host_rlt_num)},
+   {"ssu_tx_in_num",
+   

[PATCH net-next 3/9] net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support

2017-06-09 Thread Salil Mehta
This patch adds the support of IMP (Integrated Management Processor)
command interface to the HNS3 driver.

Each PF/VF has support of CQP(Command Queue Pair) ring interface.
Each CQP consis of send queue CSQ and receive queue CRQ.
There are various commands a PF/VF may support, like for Flow Table
manipulation, Device management, Packet buffer allocation, Forwarding,
VLANs config, Tunneling/Overlays etc.

This patch contains code to initialize the command queue, manage the
command queue descriptors and Rx/Tx protocol with the command processor
in the form of various commands/results and acknowledgements.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 347 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 742 +
 2 files changed, 1089 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
new file mode 100644
index 000..ec20ec4
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -0,0 +1,347 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hclge_cmd.h"
+#include "hnae3.h"
+#include "hclge_main.h"
+
+#define hclge_is_csq(ring) ((ring)->flag & HCLGE_TYPE_CSQ)
+#define hclge_ring_to_dma_dir(ring) (hclge_is_csq(ring) ? \
+   DMA_TO_DEVICE : DMA_FROM_DEVICE)
+#define cmq_ring_to_dev(ring)   (&(ring)->dev->pdev->dev)
+
+static int hclge_ring_space(struct hclge_cmq_ring *ring)
+{
+   int ntu = ring->next_to_use;
+   int ntc = ring->next_to_clean;
+   int used = (ntu - ntc + ring->desc_num) % ring->desc_num;
+
+   return ring->desc_num - used - 1;
+}
+
+static int hclge_alloc_cmd_desc(struct hclge_cmq_ring *ring)
+{
+   int size  = ring->desc_num * sizeof(struct hclge_desc);
+
+   ring->desc = kzalloc(size, GFP_KERNEL);
+   if (!ring->desc)
+   return -ENOMEM;
+
+   ring->desc_dma_addr = dma_map_single(cmq_ring_to_dev(ring), ring->desc,
+size, DMA_BIDIRECTIONAL);
+   if (dma_mapping_error(cmq_ring_to_dev(ring), ring->desc_dma_addr)) {
+   ring->desc_dma_addr = 0;
+   kfree(ring->desc);
+   ring->desc = NULL;
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static void hclge_free_cmd_desc(struct hclge_cmq_ring *ring)
+{
+   dma_unmap_single(cmq_ring_to_dev(ring), ring->desc_dma_addr,
+ring->desc_num * sizeof(ring->desc[0]),
+DMA_BIDIRECTIONAL);
+
+   ring->desc_dma_addr = 0;
+   kfree(ring->desc);
+   ring->desc = NULL;
+}
+
+static int hclge_init_cmd_queue(struct hclge_dev *hdev, int ring_type)
+{
+   struct hclge_hw *hw = >hw;
+   struct hclge_cmq_ring *ring =
+   (ring_type == HCLGE_TYPE_CSQ) ? >cmq.csq : >cmq.crq;
+   int ret;
+
+   ring->flag = ring_type;
+   ring->dev = hdev;
+
+   ret = hclge_alloc_cmd_desc(ring);
+   if (ret) {
+   dev_err(>pdev->dev, "descriptor %s alloc error %d\n",
+   (ring_type == HCLGE_TYPE_CSQ) ? "CSQ" : "CRQ", ret);
+   return ret;
+   }
+
+   ring->next_to_clean = 0;
+   ring->next_to_use = 0;
+
+   return 0;
+}
+
+void hclge_cmd_reuse_desc(struct hclge_desc *desc, bool is_read)
+{
+   desc->flag = cpu_to_le16(HCLGE_CMD_FLAG_NO_INTR | HCLGE_CMD_FLAG_IN);
+   if (is_read)
+   desc->flag |= cpu_to_le16(HCLGE_CMD_FLAG_WR);
+   else
+   desc->flag &= cpu_to_le16(~HCLGE_CMD_FLAG_WR);
+}
+
+void hclge_cmd_setup_basic_desc(struct hclge_desc *desc,
+   enum hclge_opcode_type opcode, bool is_read)
+{
+   memset((void *)desc, 0, sizeof(struct hclge_desc));
+   desc->opcode = cpu_to_le16(opcode);
+   desc->flag = cpu_to_le16(HCLGE_CMD_FLAG_NO_INTR | HCLGE_CMD_FLAG_IN);
+
+   if (is_read)
+   desc->flag |= cpu_to_le16(HCLGE_CMD_FLAG_WR);
+   else
+   desc->flag &= cpu_to_le16(~HCLGE_CMD_FLAG_WR);
+}
+
+static void hclge_cmd_config_regs(struct hclge_cmq_ring *ring)
+{
+   dma_addr_t dma = ring->desc_dma_addr;
+   struct hclge_dev *hdev = ring->dev;
+   struct hclge_hw *hw = >hw;
+
+   if (ring->flag == 

[PATCH net-next 1/9] net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC

2017-06-09 Thread Salil Mehta
This patch adds the support of Hisilicon Network Subsystem 3
Ethernet driver to hip08 family of SoCs.

This driver includes basic Rx/Tx functionality. It also includes
the client registration code with the HNAE3(Hisilicon Network
Acceleration Engine 3) framework.

This work provides the initial support to the hip08 SoC and
would incrementally add features or enhancements.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2851 
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  585 
 2 files changed, 3436 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
new file mode 100644
index 000..d0e4f22
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -0,0 +1,2851 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hnae3.h"
+#include "hns3_enet.h"
+
+const char hns3_driver_name[] = "hns3";
+static const char hns3_driver_string[] =
+   "Hisilicon Ethernet Network Driver for Hi162x Family";
+static const char hns3_copyright[] = "Copyright (c) 2017 Huawei Corporation.";
+
+/* hns3_pci_tbl - PCI Device ID Table
+ *
+ * Last entry must be all 0s
+ *
+ * { Vendor ID, Device ID, SubVendor ID, SubDevice ID,
+ *   Class, Class Mask, private data (not used) }
+ */
+static const struct pci_device_id hns3_pci_tbl[] = {
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_GE), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
+   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
+   /* required last entry */
+   {0, }
+};
+MODULE_DEVICE_TABLE(pci, hns3_pci_tbl);
+
+/* use only for netconsole to poll with the device without interrupt */
+#ifdef CONFIG_NET_POLL_CONTROLLER
+void hns3_nic_poll_controller(struct net_device *ndev)
+{
+   struct hns3_nic_priv *priv = netdev_priv(ndev);
+   struct hnae3_handle *h = priv->ae_handle;
+   unsigned long flag;
+   int i;
+
+   local_irq_save(flag);
+   for (i = 0; i < h->kinfo.num_tqp_vectors; i++)
+   napi_schedule(>kinfo.tqp_vectors[i].napi);
+   local_irq_restore(flag);
+}
+#endif
+
+static irqreturn_t hns3_irq_handle(int irq, void *dev)
+{
+   struct hns3_enet_tqp_vector *tqp_vector = dev;
+
+   napi_schedule(_vector->napi);
+
+   return IRQ_HANDLED;
+}
+
+static int hns3_nic_init_irq(struct hns3_nic_priv *priv)
+{
+   struct pci_dev *pdev = priv->ae_handle->pdev;
+   struct hns3_enet_tqp_vector *tqp_vectors;
+   int txrx_int_idx = 0;
+   int rx_int_idx = 0;
+   int tx_int_idx = 0;
+   int ret;
+   int i;
+
+   for (i = 0; i < priv->vector_num; i++) {
+   tqp_vectors = >tqp_vector[i];
+
+   if (tqp_vectors->irq_init_flag == HNS3_VEVTOR_INITED)
+   continue;
+
+   if (tqp_vectors->tx_group.ring && tqp_vectors->rx_group.ring) {
+   snprintf(tqp_vectors->name, HNAE3_INT_NAME_LEN - 1,
+"%s-%s-%d", priv->netdev->name, "TxRx",
+txrx_int_idx++);
+   txrx_int_idx++;
+   } else if (tqp_vectors->rx_group.ring) {
+   snprintf(tqp_vectors->name, HNAE3_INT_NAME_LEN - 1,
+"%s-%s-%d", priv->netdev->name, "Rx",
+rx_int_idx++);
+   } else if (tqp_vectors->tx_group.ring) {
+   snprintf(tqp_vectors->name, HNAE3_INT_NAME_LEN - 1,
+"%s-%s-%d", priv->netdev->name, "Tx",
+tx_int_idx++);
+   } else {
+   /* Skip this unused q_vector */
+   continue;
+   }
+
+   tqp_vectors->name[HNAE3_INT_NAME_LEN - 1] = '\0';
+
+   ret = devm_request_irq(>dev, tqp_vectors->vector_irq,
+  

[PATCH net-next 2/9] net: hns3: Add support of the HNAE3 framework

2017-06-09 Thread Salil Mehta
This patch adds the support of the HNAE3 (Hisilicon Network
Acceleration Engine 3) framework support to the HNS3 driver.

Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
and user-space Ethernet drivers (like ODP etc.) to register with HNAE3
devices and their associated operations.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.c | 305 +++
 drivers/net/ethernet/hisilicon/hns3/hnae3.h | 449 
 2 files changed, 754 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
new file mode 100644
index 000..f133e1d
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
@@ -0,0 +1,305 @@
+/*
+ * Copyright (c) 2016-2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+#include "hnae3.h"
+
+static LIST_HEAD(hnae3_ae_algo_list);
+static LIST_HEAD(hnae3_client_list);
+static LIST_HEAD(hnae3_ae_dev_list);
+
+static DEFINE_SPINLOCK(hnae3_list_ae_algo_lock);
+static DEFINE_SPINLOCK(hnae3_list_client_lock);
+static DEFINE_SPINLOCK(hnae3_list_ae_dev_lock);
+
+static void hnae3_list_add(spinlock_t *lock, struct list_head *node,
+  struct list_head *head)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(lock, flags);
+   list_add_tail_rcu(node, head);
+   spin_unlock_irqrestore(lock, flags);
+}
+
+static void hnae3_list_del(spinlock_t *lock, struct list_head *node)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(lock, flags);
+   list_del_rcu(node);
+   spin_unlock_irqrestore(lock, flags);
+}
+
+static bool hnae3_client_match(enum hnae3_client_type client_type,
+  enum hnae3_dev_type dev_type)
+{
+   if (dev_type == HNAE3_DEV_KNIC) {
+   switch (client_type) {
+   case HNAE3_CLIENT_KNIC:
+   case HNAE3_CLIENT_ROCE:
+   return true;
+   default:
+   return false;
+   }
+   } else if (dev_type == HNAE3_DEV_UNIC) {
+   switch (client_type) {
+   case HNAE3_CLIENT_UNIC:
+   return true;
+   default:
+   return false;
+   }
+   } else {
+   return false;
+   }
+}
+
+int hnae3_register_client(struct hnae3_client *client)
+{
+   struct hnae3_client *client_tmp;
+   struct hnae3_ae_dev *ae_dev;
+   int ret;
+
+   /* One system should only have one client for every type */
+   list_for_each_entry(client_tmp, _client_list, node) {
+   if (client_tmp->type == client->type)
+   return 0;
+   }
+
+   hnae3_list_add(_list_client_lock, >node,
+  _client_list);
+
+   /* Check if there are matched ae_dev */
+   list_for_each_entry(ae_dev, _ae_dev_list, node) {
+   if (hnae3_client_match(client->type, ae_dev->dev_type) &&
+   hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B)) {
+   if (ae_dev->ops && ae_dev->ops->register_client) {
+   ret = ae_dev->ops->register_client(client,
+  ae_dev);
+   if (ret) {
+   dev_err(_dev->pdev->dev,
+   "init ae_dev error.\n");
+   return ret;
+   }
+   }
+   }
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(hnae3_register_client);
+
+void hnae3_unregister_client(struct hnae3_client *client)
+{
+   struct hnae3_ae_dev *ae_dev;
+
+   /* Check if there are matched ae_dev */
+   list_for_each_entry(ae_dev, _ae_dev_list, node) {
+   if (hnae3_client_match(client->type, ae_dev->dev_type) &&
+   hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))
+   if (ae_dev->ops && ae_dev->ops->unregister_client)
+   ae_dev->ops->unregister_client(client, ae_dev);
+   }
+   hnae3_list_del(_list_client_lock, >node);
+}
+EXPORT_SYMBOL(hnae3_unregister_client);
+
+/* hnae_ae_register - register a AE engine to hnae framework
+ * @hdev: the hnae ae engine device
+ * @owner:  the module who 

[PATCH net-next 5/9] net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver

2017-06-09 Thread Salil Mehta
THis patch adds the support of the Scheduling and Shaping
functionalities during the transmit leg. This also adds the
support of Pause at MAC level. (Pause at per-priority level
shall be added later along with the DCB feature).

Hardware as such consists of two types of cofiguration of 6 level
schedulers. Algorithms varies according to the level and type
of scheduler being used. Current patch is used to initialize
the mapping, algorithms(like SP, DWRR etc) and shaper(CIR, PIR etc)
being used.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 1018 
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |  108 +++
 2 files changed, 1126 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
new file mode 100644
index 000..2b66a0e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -0,0 +1,1018 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+
+#include "hclge_cmd.h"
+#include "hclge_main.h"
+#include "hclge_tm.h"
+
+enum hclge_shaper_level {
+   HCLGE_SHAPER_LVL_PRI= 0,
+   HCLGE_SHAPER_LVL_PG = 1,
+   HCLGE_SHAPER_LVL_PORT   = 2,
+   HCLGE_SHAPER_LVL_QSET   = 3,
+   HCLGE_SHAPER_LVL_CNT= 4,
+   HCLGE_SHAPER_LVL_VF = 0,
+   HCLGE_SHAPER_LVL_PF = 1,
+};
+
+#define HCLGE_SHAPER_BS_U_DEF  1
+#define HCLGE_SHAPER_BS_S_DEF  4
+
+#define HCLGE_ETHER_MAX_RATE   10
+
+/* hclge_shaper_para_calc: calculate ir parameter for the shaper
+ * @ir: Rate to be config, its unit is Mbps
+ * @shaper_level: the shaper level. eg: port, pg, priority, queueset
+ * @ir_b: IR_B parameter of IR shaper
+ * @ir_u: IR_U parameter of IR shaper
+ * @ir_s: IR_S parameter of IR shaper
+ *
+ * the formula:
+ *
+ * IR_b * (2 ^ IR_u) * 8
+ * IR(Mbps) = -  *  CLOCK(1000Mbps)
+ * Tick * (2 ^ IR_s)
+ *
+ * @return: 0: calculate sucessful, negative: fail
+ */
+static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
+ u8 *ir_b, u8 *ir_u, u8 *ir_s)
+{
+   const u16 tick_array[HCLGE_SHAPER_LVL_CNT] = {
+   6 * 256,/* Prioriy level */
+   6 * 32, /* Prioriy group level */
+   6 * 8,  /* Port level */
+   6 * 256 /* Qset level */
+   };
+   u8 ir_u_calc = 0, ir_s_calc = 0;
+   u32 ir_calc;
+   u32 tick;
+
+   /* Calc tick */
+   if (shaper_level >= HCLGE_SHAPER_LVL_CNT)
+   return -ENOMEM;
+
+   tick = tick_array[shaper_level];
+
+   /**
+* Calc the speed if ir_b = 126, ir_u = 0 and ir_s = 0
+* the formula is changed to:
+*  126 * 1 * 8
+* ir_calc =  * 1000
+*  tick * 1
+*/
+   ir_calc = (1008000 + (tick >> 1) - 1) / tick;
+
+   if (ir_calc == ir) {
+   *ir_b = 126;
+   *ir_u = 0;
+   *ir_s = 0;
+
+   return 0;
+   } else if (ir_calc > ir) {
+   /* Increasing the denominator to select ir_s value */
+   while (ir_calc > ir) {
+   ir_s_calc++;
+   ir_calc = 1008000 / (tick * (1 << ir_s_calc));
+   }
+
+   if (ir_calc == ir)
+   *ir_b = 126;
+   else
+   *ir_b = (ir * tick * (1 << ir_s_calc) + 4000) / 8000;
+   } else {
+   /* Increasing the numerator to select ir_u value */
+   u32 numerator;
+
+   while (ir_calc < ir) {
+   ir_u_calc++;
+   numerator = 1008000 * (1 << ir_u_calc);
+   ir_calc = (numerator + (tick >> 1)) / tick;
+   }
+
+   if (ir_calc == ir) {
+   *ir_b = 126;
+   } else {
+   u32 denominator = (8000 * (1 << --ir_u_calc));
+   *ir_b = (ir * tick + (denominator >> 1)) / denominator;
+   }
+   }
+
+   *ir_u = ir_u_calc;
+   *ir_s = ir_s_calc;
+
+   return 0;
+}
+
+static int hclge_mac_pause_en_cfg(struct hclge_dev *hdev, bool tx, bool rx)
+{
+   struct hclge_desc desc;
+
+   hclge_cmd_setup_basic_desc(, 

[PATCH net-next 7/9] net: hns3: Add Ethtool support to HNS3 driver

2017-06-09 Thread Salil Mehta
This patch adds the support of the Ethtool interface to
the HNS3 Ethernet driver. Various commands to read the
statistics, configure the offloading, loopback selftest etc.
are supported.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 894 +
 1 file changed, 894 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
new file mode 100644
index 000..83fde08
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
@@ -0,0 +1,894 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include "hns3_enet.h"
+
+struct hns3_stats {
+   char stats_string[ETH_GSTRING_LEN];
+   int stats_size;
+   int stats_offset;
+};
+
+/* netdev related stats */
+#define HNS3_NETDEV_STAT(_string, _member) \
+   { _string,  \
+ FIELD_SIZEOF(struct rtnl_link_stats64, _member),  \
+ offsetof(struct rtnl_link_stats64, _member),  \
+   }
+
+static const struct hns3_stats hns3_netdev_stats[] = {
+   /* misc. Rx/Tx statistics */
+   HNS3_NETDEV_STAT("rx_packets", rx_packets),
+   HNS3_NETDEV_STAT("tx_packets", tx_packets),
+   HNS3_NETDEV_STAT("rx_bytes", rx_bytes),
+   HNS3_NETDEV_STAT("tx_bytes", tx_bytes),
+   HNS3_NETDEV_STAT("rx_errors", rx_errors),
+   HNS3_NETDEV_STAT("tx_errors", tx_errors),
+   HNS3_NETDEV_STAT("rx_dropped", rx_dropped),
+   HNS3_NETDEV_STAT("tx_dropped", tx_dropped),
+   HNS3_NETDEV_STAT("multicast", multicast),
+   HNS3_NETDEV_STAT("collisions", collisions),
+
+   /* detailed Rx errors */
+   HNS3_NETDEV_STAT("rx_length_errors", rx_length_errors),
+   HNS3_NETDEV_STAT("rx_over_errors", rx_over_errors),
+   HNS3_NETDEV_STAT("rx_crc_errors", rx_crc_errors),
+   HNS3_NETDEV_STAT("rx_frame_errors", rx_frame_errors),
+   HNS3_NETDEV_STAT("rx_fifo_errors", rx_fifo_errors),
+   HNS3_NETDEV_STAT("rx_missed_errors", rx_missed_errors),
+
+   /* detailed Tx errors */
+   HNS3_NETDEV_STAT("tx_aborted_errors", tx_aborted_errors),
+   HNS3_NETDEV_STAT("tx_carrier_errors", tx_carrier_errors),
+   HNS3_NETDEV_STAT("tx_fifo_errors", tx_fifo_errors),
+   HNS3_NETDEV_STAT("tx_heartbeat_errors", tx_heartbeat_errors),
+   HNS3_NETDEV_STAT("tx_window_errors", tx_window_errors),
+
+   /* for cslip etc */
+   HNS3_NETDEV_STAT("rx_compressed", rx_compressed),
+   HNS3_NETDEV_STAT("tx_compressed", tx_compressed),
+};
+
+#define HNS3_NETDEV_STATS_COUNT ARRAY_SIZE(hns3_netdev_stats)
+
+/* tqp related stats */
+#define HNS3_TQP_STAT(_string, _member)\
+   { _string,  \
+ FIELD_SIZEOF(struct ring_stats, _member), \
+ offsetof(struct hns3_enet_ring, stats),   \
+   }
+
+static const struct hns3_stats hns3_txq_stats[] = {
+   /* Tx per-queue statistics */
+   HNS3_TQP_STAT("tx_io_err_cnt", io_err_cnt),
+   HNS3_TQP_STAT("tx_sw_err_cnt", sw_err_cnt),
+   HNS3_TQP_STAT("tx_seg_pkt_cnt", seg_pkt_cnt),
+   HNS3_TQP_STAT("tx_pkts", tx_pkts),
+   HNS3_TQP_STAT("tx_bytes", tx_bytes),
+   HNS3_TQP_STAT("tx_err_cnt", tx_err_cnt),
+   HNS3_TQP_STAT("tx_restart_queue", restart_queue),
+   HNS3_TQP_STAT("tx_busy", tx_busy),
+};
+
+#define HNS3_TXQ_STATS_COUNT ARRAY_SIZE(hns3_txq_stats)
+
+static const struct hns3_stats hns3_rxq_stats[] = {
+   /* Rx per-queue statistics */
+   HNS3_TQP_STAT("rx_io_err_cnt", io_err_cnt),
+   HNS3_TQP_STAT("rx_sw_err_cnt", sw_err_cnt),
+   HNS3_TQP_STAT("rx_seg_pkt_cnt", seg_pkt_cnt),
+   HNS3_TQP_STAT("rx_pkts", rx_pkts),
+   HNS3_TQP_STAT("rx_bytes", rx_bytes),
+   HNS3_TQP_STAT("rx_err_cnt", rx_err_cnt),
+   HNS3_TQP_STAT("rx_reuse_pg_cnt", reuse_pg_cnt),
+   HNS3_TQP_STAT("rx_err_pkt_len", err_pkt_len),
+   HNS3_TQP_STAT("rx_non_vld_descs", non_vld_descs),
+   HNS3_TQP_STAT("rx_err_bd_num", err_bd_num),
+   HNS3_TQP_STAT("rx_l2_err", l2_err),
+   HNS3_TQP_STAT("rx_l3l4_csum_err", l3l4_csum_err),
+};
+
+#define HNS3_RXQ_STATS_COUNT ARRAY_SIZE(hns3_rxq_stats)
+
+#define HNS3_TQP_STATS_COUNT (HNS3_TXQ_STATS_COUNT + HNS3_RXQ_STATS_COUNT)
+
+struct hns3_link_mode_mapping {
+   u32 hns3_link_mode;
+   

[PATCH net-next 6/9] net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC

2017-06-09 Thread Salil Mehta
This patch adds the support of MDIO bus interface for HNS3 driver.
Code provides various interfaces to start and stop the PHY layer
and to read and write the MDIO bus or PHY.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c| 310 +
 1 file changed, 310 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
new file mode 100644
index 000..c6812d2
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+#include "hclge_cmd.h"
+#include "hclge_main.h"
+
+enum hclge_mdio_c22_op_seq {
+   HCLGE_MDIO_C22_WRITE = 1,
+   HCLGE_MDIO_C22_READ = 2
+};
+
+enum hclge_mdio_c45_op_seq {
+   HCLGE_MDIO_C45_WRITE_ADDR = 0,
+   HCLGE_MDIO_C45_WRITE_DATA,
+   HCLGE_MDIO_C45_READ_INCREMENT,
+   HCLGE_MDIO_C45_READ
+};
+
+#define HCLGE_MDIO_CTRL_START_BIT   BIT(0)
+#define HCLGE_MDIO_CTRL_ST_MSK  GENMASK(2, 1)
+#define HCLGE_MDIO_CTRL_ST_LSH  1
+#define HCLGE_MDIO_IS_C22(c22)  (((c22) << HCLGE_MDIO_CTRL_ST_LSH) & \
+   HCLGE_MDIO_CTRL_ST_MSK)
+
+#define HCLGE_MDIO_CTRL_OP_MSK  GENMASK(4, 3)
+#define HCLGE_MDIO_CTRL_OP_LSH  3
+#define HCLGE_MDIO_CTRL_OP(access) \
+   (((access) << HCLGE_MDIO_CTRL_OP_LSH) & HCLGE_MDIO_CTRL_OP_MSK)
+#define HCLGE_MDIO_CTRL_PRTAD_MSK   GENMASK(4, 0)
+#define HCLGE_MDIO_CTRL_DEVAD_MSK   GENMASK(4, 0)
+
+#define HCLGE_MDIO_STA_VAL(val)((val) & BIT(0))
+
+struct hclge_mdio_cfg_cmd {
+   u8 ctrl_bit;
+   u8 prtad;   /* The external port address */
+   u8 devad;   /* The external device address */
+   u8 rsvd;
+   __le16 addr_c45;/* Only valid for c45 */
+   __le16 data_wr;
+   __le16 data_rd;
+   __le16 sta;
+};
+
+static int hclge_mdio_write(struct mii_bus *bus, int phy_id, int regnum,
+   u16 data)
+{
+   struct hclge_dev *hdev = (struct hclge_dev *)bus->priv;
+   struct hclge_mdio_cfg_cmd *mdio_cmd;
+   enum hclge_cmd_status status;
+   struct hclge_desc desc;
+   u8 is_c45, devad;
+   u16 reg;
+
+   if (!bus)
+   return -EINVAL;
+
+   is_c45 = !!(regnum & MII_ADDR_C45);
+   devad = ((regnum >> 16) & 0x1f);
+   reg = (u16)(regnum & 0x);
+
+   hclge_cmd_setup_basic_desc(, HCLGE_OPC_MDIO_CONFIG, false);
+
+   mdio_cmd = (struct hclge_mdio_cfg_cmd *)desc.data;
+
+   if (!is_c45) {
+   /* C22 write reg and data */
+   mdio_cmd->ctrl_bit = HCLGE_MDIO_IS_C22(!is_c45);
+   mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_OP(HCLGE_MDIO_C22_WRITE);
+   mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_START_BIT;
+   mdio_cmd->data_wr = cpu_to_le16(data);
+   mdio_cmd->devad = devad & HCLGE_MDIO_CTRL_DEVAD_MSK;
+   mdio_cmd->prtad = phy_id & HCLGE_MDIO_CTRL_PRTAD_MSK;
+   } else {
+   /* Set phy addr */
+   mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_START_BIT;
+   mdio_cmd->addr_c45 = cpu_to_le16(reg);
+   mdio_cmd->data_wr = cpu_to_le16(data);
+   mdio_cmd->devad = devad & HCLGE_MDIO_CTRL_DEVAD_MSK;
+   mdio_cmd->prtad = phy_id & HCLGE_MDIO_CTRL_PRTAD_MSK;
+   }
+
+   status = hclge_cmd_send(>hw, , 1);
+   if (status) {
+   dev_err(>pdev->dev,
+   "mdio write fail when sending cmd, status is %d.\n",
+   status);
+   return -EIO;
+   }
+
+   return 0;
+}
+
+static int hclge_mdio_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+   struct hclge_dev *hdev = (struct hclge_dev *)bus->priv;
+   struct hclge_mdio_cfg_cmd *mdio_cmd;
+   enum hclge_cmd_status status;
+   struct hclge_desc desc;
+   u8 is_c45, devad;
+   u16 reg;
+
+   if (!bus)
+   return -EINVAL;
+
+   is_c45 = !!(regnum & MII_ADDR_C45);
+   devad = ((regnum >> 16) & GENMASK(4, 0));
+   reg = (u16)(regnum & GENMASK(15, 0));
+
+   hclge_cmd_setup_basic_desc(, HCLGE_OPC_MDIO_CONFIG, true);
+
+   mdio_cmd = (struct hclge_mdio_cfg_cmd *)desc.data;
+
+   dev_dbg(>dev, "phy id=%d, is_c45=%d, devad=%d, reg=%#x!\n",
+   phy_id, is_c45, devad, reg);
+
+   if (!is_c45) {
+   /* C22 read reg */
+   

[PATCH net-next 9/9] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS

2017-06-09 Thread Salil Mehta
This patch updates the MAINTAINERS file with HNS3 Ethernet driver
maintainers names and other details. This also introduces the new
Makefiles required to build the HNS3 Ethernet driver and updates
the existing Kconfig file in the hisilicon folder.

Signed-off-by: Salil Mehta 
---
 MAINTAINERS|  8 
 drivers/net/ethernet/hisilicon/Kconfig | 24 ++
 drivers/net/ethernet/hisilicon/Makefile|  1 +
 drivers/net/ethernet/hisilicon/hns3/Makefile   |  7 +++
 .../net/ethernet/hisilicon/hns3/hns3pf/Makefile| 11 ++
 5 files changed, 51 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b8249b..cda0e80 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6070,6 +6070,14 @@ S:   Maintained
 F: drivers/net/ethernet/hisilicon/
 F: Documentation/devicetree/bindings/net/hisilicon*.txt
 
+HISILICON NETWORK SUBSYSTEM 3 DRIVER (HNS3)
+M: Yisen Zhuang 
+M: Salil Mehta 
+L: netdev@vger.kernel.org
+W: http://www.hisilicon.com
+S: Maintained
+F: drivers/net/ethernet/hisilicon/hns3/
+
 HISILICON ROCE DRIVER
 M: Lijun Ou 
 M: Wei Hu(Xavier) 
diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index d11287e..2c48fce 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -76,4 +76,28 @@ config HNS_ENET
  This selects the general ethernet driver for HNS.  This module make
  use of any HNS AE driver, such as HNS_DSAF
 
+config HNS3
+   tristate "Hisilicon Network Subsystem Support HNS3 (Framework)"
+   ---help---
+ This selects the framework support for Hisilicon Network Subsystem 3.
+ This layer facilitates clients like ENET, RoCE and user-space ethernet
+ drivers(like ODP)to register with HNAE devices and their associated
+ operations.
+
+config HNS3_HCLGE
+   tristate "Hisilicon HNS3 HCLGE Acceleration Engine & Compatibility 
Layer Support"
+   select HNS3
+   ---help---
+ This selects the HNS3_HCLGE network acceleration engine & its hardware
+ compatibility layer. The engine would be used in Hisilicon hip08 
family of
+ SoCs and further upcoming SoCs.
+
+config HNS3_ENET
+   tristate "Hisilicon HNS3 Ethernet Device Support"
+   select HNS3
+   ---help---
+ This selects the Ethernet Driver for Hisilicon Network Subsystem 3 
for hip08
+ family of SoCs. This module depends upon HNAE3 driver to access the 
HNAE3
+ devices and their associated operations.
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile 
b/drivers/net/ethernet/hisilicon/Makefile
index 8661695..3828c43 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
 obj-$(CONFIG_HIP04_ETH) += hip04_eth.o
 obj-$(CONFIG_HNS_MDIO) += hns_mdio.o
 obj-$(CONFIG_HNS) += hns/
+obj-$(CONFIG_HNS3) += hns3/
 obj-$(CONFIG_HISI_FEMAC) += hisi_femac.o
diff --git a/drivers/net/ethernet/hisilicon/hns3/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/Makefile
new file mode 100644
index 000..5e53735
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+obj-$(CONFIG_HNS3) += hns3pf/
+
+obj-$(CONFIG_HNS3) +=hnae3.o
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
new file mode 100644
index 000..8c3fd38
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+ccflags-y := -Idrivers/net/ethernet/hisilicon/hns3
+
+obj-$(CONFIG_HNS3_HCLGE) += hclge.o
+hclge-objs =hclge_main.o hclge_cmd.o hclge_mdio.o hclge_debugfs.o hclge_tm.o
+
+obj-$(CONFIG_HNS3_ENET) += hns3.o
+hns3-objs = hns3_enet.o hns3_ethtool.o
-- 
2.7.4




[PATCH net-next 8/9] net: hns3: Add support of debugfs interface to HNS3 driver

2017-06-09 Thread Salil Mehta
This adds the support of the debugfs interface to the driver for
debugging purposes.

Signed-off-by: Daode Huang 
Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
Signed-off-by: Yisen Zhuang 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 188 +
 1 file changed, 188 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
new file mode 100644
index 000..8ef5a41
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -0,0 +1,188 @@
+/*
+ * Copyright (c) 2016~2017 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hclge_cmd.h"
+#include "hclge_main.h"
+#include "hnae3.h"
+
+static struct dentry *hclge_dbgfs_root;
+static int hclge_dbg_usage(struct hclge_dev *hdev, char *data);
+#define HCLGE_DBG_READ_LEN 256
+
+struct hclge_support_cmd {
+   char *name;
+   int len;
+   int (*fn)(struct hclge_dev *hdev, char *data);
+   char *param;
+};
+
+static int hclge_dbg_send(struct hclge_dev *hdev, char *buf)
+{
+   struct hclge_desc desc;
+   enum hclge_cmd_status status;
+   int cnt;
+
+   cnt = sscanf(buf, "%hi %hi %i %i %i %i %i %i",
+, ,
+[0], [1], [2],
+[3], [4], [5]);
+   if (cnt != 8) {
+   dev_info(>pdev->dev,
+"send cmd: bad command parameter, cnt=%d\n", cnt);
+   return -EINVAL;
+   }
+
+   status = hclge_cmd_send(>hw, , 1);
+   if (status) {
+   dev_info(>pdev->dev,
+"send comamnd fail Opcode:%x, Status:%d\n",
+desc.opcode, status);
+   }
+   dev_info(>pdev->dev, "get response:\n");
+   dev_info(>pdev->dev, "opcode:%04x\tflag:%04x\tretval:%04x\t\n",
+desc.opcode, desc.flag, desc.retval);
+   dev_info(>pdev->dev, "data[0~2]:%08x\t%08x\t%08x\n",
+desc.data[0], desc.data[1], desc.data[2]);
+   dev_info(>pdev->dev, "data[3-5]:%08x\t%08x\t%08x\n",
+desc.data[3], desc.data[4], desc.data[5]);
+   return 0;
+}
+
+const struct  hclge_support_cmd  support_cmd[] = {
+   {"send cmd", 8, hclge_dbg_send,
+   "opcode flag data0 data1 data2 data3 data4 data5"},
+   {"help", 4, hclge_dbg_usage, "no option"},
+};
+
+static int hclge_dbg_usage(struct hclge_dev *hdev, char *data)
+{
+   int i;
+
+   pr_info("supported cmd list:\n");
+   for (i = 0; i < ARRAY_SIZE(support_cmd); i++)
+   pr_info("%s: %s\n", support_cmd[i].name, support_cmd[i].param);
+
+   return 0;
+}
+
+static ssize_t hclge_dbg_cmd_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+   int uncopy_bytes;
+   char *buf;
+   int len;
+
+   if (*ppos != 0)
+   return 0;
+   if (count < HCLGE_DBG_READ_LEN)
+   return -ENOSPC;
+   buf = kzalloc(HCLGE_DBG_READ_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOSPC;
+
+   len = snprintf(buf, HCLGE_DBG_READ_LEN, "%s\n",
+  "Please echo help to cmd to get help information");
+   uncopy_bytes = copy_to_user(buffer, buf, len);
+   kfree(buf);
+
+   if (uncopy_bytes)
+   return -EFAULT;
+
+   *ppos = len;
+   return len;
+}
+
+static ssize_t hclge_dbg_cmd_write(struct file *filp, const char __user 
*buffer,
+  size_t count, loff_t *ppos)
+{
+   struct hclge_dev *hdev = filp->private_data;
+   char *cmd_buf, *cmd_buf_tmp;
+   int uncopied_bytes;
+   int i;
+
+   if (*ppos != 0)
+   return 0;
+   cmd_buf = kzalloc(count + 1, GFP_KERNEL);
+   if (!cmd_buf)
+   return count;
+   uncopied_bytes = copy_from_user(cmd_buf, buffer, count);
+   if (uncopied_bytes) {
+   kfree(cmd_buf);
+   return -EFAULT;
+   }
+   cmd_buf[count] = '\0';
+
+   cmd_buf_tmp = strchr(cmd_buf, '\n');
+   if (cmd_buf_tmp) {
+   *cmd_buf_tmp = '\0';
+   count = cmd_buf_tmp - cmd_buf + 1;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(support_cmd); i++) {
+   if (strncmp(cmd_buf, support_cmd[i].name,
+   support_cmd[i].len) == 0) {
+   support_cmd[i].fn(hdev, _buf[support_cmd[i].len]);
+   

[PATCH 0/6] Constant Time Memory Comparisons Are Important

2017-06-09 Thread Jason A. Donenfeld
Whenever you're comparing two MACs, it's important to do this using
crypto_memneq instead of memcmp. With memcmp, you leak timing information,
which could then be used to iteratively forge a MAC. This is far too basic
of a mistake for us to have so pervasively in the year 2017, so let's begin
cleaning this stuff up. The following 6 locations were found with some
simple regex greps, but I'm sure more lurk below the surface. If you
maintain some code or know somebody who maintains some code that deals
with MACs, tell them to double check which comparison function they're
using.

Jason A. Donenfeld (6):
  sunrpc: use constant time memory comparison for mac
  net/ipv6: use constant time memory comparison for mac
  ccree: use constant time memory comparison for macs and tags
  security/keys: use constant time memory comparison for macs
  bluetooth/smp: use constant time memory comparison for secret values
  mac80211/wpa: use constant time memory comparison for MACs

Cc: Anna Schumaker 
Cc: David Howells 
Cc: David Safford 
Cc: "David S. Miller" 
Cc: Gilad Ben-Yossef 
Cc: Greg Kroah-Hartman 
Cc: Gustavo Padovan 
Cc: "J. Bruce Fields" 
Cc: Jeff Layton 
Cc: Johan Hedberg 
Cc: Johannes Berg 
Cc: Marcel Holtmann 
Cc: Mimi Zohar 
Cc: Trond Myklebust 
Cc: keyri...@vger.kernel.org
Cc: linux-blueto...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org

 drivers/staging/ccree/ssi_fips_ll.c   | 17 ---
 net/bluetooth/smp.c   | 39 ++-
 net/ipv6/seg6_hmac.c  |  3 ++-
 net/mac80211/wpa.c|  9 
 net/sunrpc/auth_gss/gss_krb5_crypto.c |  3 ++-
 security/keys/trusted.c   |  7 ---
 6 files changed, 42 insertions(+), 36 deletions(-)

-- 
2.13.1


[PATCH 2/6] net/ipv6: use constant time memory comparison for mac

2017-06-09 Thread Jason A. Donenfeld
Otherwise, we enable a MAC forgery via timing attack.

Signed-off-by: Jason A. Donenfeld 
Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Cc: sta...@vger.kernel.org
---
 net/ipv6/seg6_hmac.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index f950cb53d5e3..54213c83b44e 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -38,6 +38,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -274,7 +275,7 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
if (seg6_hmac_compute(hinfo, srh, _hdr(skb)->saddr, hmac_output))
return false;
 
-   if (memcmp(hmac_output, tlv->hmac, SEG6_HMAC_FIELD_LEN) != 0)
+   if (crypto_memneq(hmac_output, tlv->hmac, SEG6_HMAC_FIELD_LEN))
return false;
 
return true;
-- 
2.13.1



[PATCH v2 2/2] tcp: md5: extend the tcp_md5sig struct to specify a key address prefix

2017-06-09 Thread Ivan Delalande
Add a flag field and address prefix length at the end of the tcp_md5sig
structure so users can configure an address prefix length along with a
key. Make sure shorter option values are still accepted in
tcp_v4_parse_md5_keys and tcp_v6_parse_md5_keys to maintain backward
compatibility.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/uapi/linux/tcp.h |  8 
 net/ipv4/tcp_ipv4.c  | 15 +++
 net/ipv6/tcp_ipv6.c  | 24 +---
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07afdff..440a8d983e4b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -233,6 +233,12 @@ enum {
 
 /* for TCP_MD5SIG socket option */
 #define TCP_MD5SIG_MAXKEYLEN   80
+/* original struct stopped at tcpm_key and must still be considered valid */
+#define TCP_MD5SIG_LEGACY_LEN  (offsetof(struct tcp_md5sig, tcpm_key) + \
+TCP_MD5SIG_MAXKEYLEN)
+
+/* tcp_md5sig flags */
+#define TCP_MD5SIG_FLAG_PREFIX 1   /* address prefix length */
 
 struct tcp_md5sig {
struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
@@ -240,6 +246,8 @@ struct tcp_md5sig {
__u16   tcpm_keylen;/* key length */
__u32   __tcpm_pad2;/* zero */
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
+   __u8tcpm_flags; /* flags */
+   __u8tcpm_prefixlen; /* address prefix */
 };
 
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 51ca3bd5a8a3..96a56224b913 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1069,25 +1069,32 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
 {
struct tcp_md5sig cmd;
struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
+   u8 prefixlen = 32;
 
-   if (optlen < sizeof(cmd))
+   if (optlen < TCP_MD5SIG_LEGACY_LEN)
return -EINVAL;
 
-   if (copy_from_user(, optval, sizeof(cmd)))
+   if (copy_from_user(, optval, min_t(size_t, sizeof(cmd), optlen)))
return -EFAULT;
 
if (sin->sin_family != AF_INET)
return -EINVAL;
 
+   if (optlen >= sizeof(cmd) && cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
+   prefixlen = cmd.tcpm_prefixlen;
+   if (prefixlen > 32)
+   return -EINVAL;
+   }
+
if (!cmd.tcpm_keylen)
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin_addr.s_addr,
- AF_INET, 32);
+ AF_INET, prefixlen);
 
if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN)
return -EINVAL;
 
return tcp_md5_do_add(sk, (union tcp_md5_addr *)>sin_addr.s_addr,
- AF_INET, 32, cmd.tcpm_key, cmd.tcpm_keylen,
+ AF_INET, prefixlen, cmd.tcpm_key, cmd.tcpm_keylen,
  GFP_KERNEL);
 }
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5cf19dab60aa..aff909e19b3d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -519,22 +519,32 @@ static int tcp_v6_parse_md5_keys(struct sock *sk, char 
__user *optval,
 {
struct tcp_md5sig cmd;
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)_addr;
+   u8 prefixlen;
 
-   if (optlen < sizeof(cmd))
+   if (optlen < TCP_MD5SIG_LEGACY_LEN)
return -EINVAL;
 
-   if (copy_from_user(, optval, sizeof(cmd)))
+   if (copy_from_user(, optval, min_t(size_t, sizeof(cmd), optlen)))
return -EFAULT;
 
if (sin6->sin6_family != AF_INET6)
return -EINVAL;
 
+   if (optlen >= sizeof(cmd) && cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
+   prefixlen = cmd.tcpm_prefixlen;
+   if (prefixlen > 128 || (ipv6_addr_v4mapped(>sin6_addr) &&
+   prefixlen > 32))
+   return -EINVAL;
+   } else {
+   prefixlen = ipv6_addr_v4mapped(>sin6_addr) ? 32 : 128;
+   }
+
if (!cmd.tcpm_keylen) {
if (ipv6_addr_v4mapped(>sin6_addr))
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin6_addr.s6_addr32[3],
- AF_INET, 32);
+ AF_INET, prefixlen);
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin6_addr,
- AF_INET6, 128);
+ AF_INET6, prefixlen);
}
 
if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN)
@@ -542,12 +552,12 @@ static int 

[PATCH v2 1/2] tcp: md5: add an address prefix for key lookup

2017-06-09 Thread Ivan Delalande
This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/net/tcp.h   |  6 +++--
 net/ipv4/tcp_ipv4.c | 68 ++---
 net/ipv6/tcp_ipv6.c | 12 ++
 3 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 38a7427ae902..2b68023ab095 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1395,6 +1395,7 @@ struct tcp_md5sig_key {
u8  keylen;
u8  family; /* AF_INET or AF_INET6 */
union tcp_md5_addr  addr;
+   u8  prefixlen;
u8  key[TCP_MD5SIG_MAXKEYLEN];
struct rcu_head rcu;
 };
@@ -1438,9 +1439,10 @@ struct tcp_md5sig_pool {
 int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key,
const struct sock *sk, const struct sk_buff *skb);
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp);
+  int family, u8 prefixlen, const u8 *newkey, u8 newkeylen,
+  gfp_t gfp);
 int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr,
-  int family);
+  int family, u8 prefixlen);
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5ab2aac5ca19..51ca3bd5a8a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -80,6 +80,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -906,6 +907,9 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock 
*sk,
struct tcp_md5sig_key *key;
unsigned int size = sizeof(struct in_addr);
const struct tcp_md5sig_info *md5sig;
+   __be32 mask;
+   struct tcp_md5sig_key *best_match = NULL;
+   bool match;
 
/* caller either holds rcu_read_lock() or socket lock */
md5sig = rcu_dereference_check(tp->md5sig_info,
@@ -919,12 +923,55 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct 
sock *sk,
hlist_for_each_entry_rcu(key, >head, node) {
if (key->family != family)
continue;
-   if (!memcmp(>addr, addr, size))
+
+   if (family == AF_INET) {
+   mask = inet_make_mask(key->prefixlen);
+   match = (key->addr.a4.s_addr & mask) ==
+   (addr->a4.s_addr & mask);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (family == AF_INET6) {
+   match = ipv6_prefix_equal(>addr.a6, >a6,
+ key->prefixlen);
+#endif
+   } else {
+   match = false;
+   }
+
+   if (match && (!best_match ||
+ key->prefixlen > best_match->prefixlen))
+   best_match = key;
+   }
+   return best_match;
+}
+EXPORT_SYMBOL(tcp_md5_do_lookup);
+
+struct tcp_md5sig_key *tcp_md5_do_lookup_exact(const struct sock *sk,
+  const union tcp_md5_addr *addr,
+  int family, u8 prefixlen)
+{
+   const struct tcp_sock *tp = tcp_sk(sk);
+   struct tcp_md5sig_key *key;
+   unsigned int size = sizeof(struct in_addr);
+   const struct tcp_md5sig_info *md5sig;
+
+   /* caller either holds rcu_read_lock() or socket lock */
+   md5sig = rcu_dereference_check(tp->md5sig_info,
+  lockdep_sock_is_held(sk));
+   if (!md5sig)
+   return NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   if (family == AF_INET6)
+   size = sizeof(struct in6_addr);
+#endif
+   hlist_for_each_entry_rcu(key, >head, node) {
+   if (key->family != family)
+   continue;
+   if (!memcmp(>addr, addr, size) &&
+   key->prefixlen == prefixlen)
return key;
}
return NULL;
 }
-EXPORT_SYMBOL(tcp_md5_do_lookup);
 
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk)
@@ -938,14 +985,15 @@ EXPORT_SYMBOL(tcp_v4_md5_lookup);
 
 /* This can be called on a newly created socket, from other files */
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp)
+  int family, u8 prefixlen, const u8 *newkey, u8 

[PATCH iproute/master 1/3] iptunnel: document mode parameter for sit tunnels

2017-06-09 Thread Krister Johansen
Original-Author: Simon Horman 
Signed-off-by: Krister Johansen 
---
 man/man8/ip-link.8.in | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 5d73538..3cc2f5d 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -660,7 +660,9 @@ the following additional arguments are supported:
 ] [
 .RB [ no ] encap-csum
 ] [
-.RB [ no ] encap-remcsum
+.I " [no]encap-remcsum "
+] [
+.I " mode " { ip6ip | ipip | any } "
 ]
 
 .in +8
@@ -697,6 +699,12 @@ encapsulation.
 - specifies if Remote Checksum Offload is enabled. This is only
 applicable for Generic UDP Encapsulation.
 
+.sp
+.BI mode " { ip6ip | ipip | any } "
+- specifies mode in which device should run. "ip6ip" indicates
+IPv6-Over-IPv4, "ipip" indicates "IPv4-Over-IPv4", "any" indicates either
+IPv6 or IPv4 Over IPv4. Only supported for SIT where the default is "ip6ip".
+
 .in -8
 
 .TP
-- 
2.7.4



[PATCH iproute/master 3/3] iptunnel: add support for mpls/ip to ipip tunnels

2017-06-09 Thread Krister Johansen
Original-Author: Simon Horman 
Signed-off-by: Krister Johansen 
---
 ip/link_iptnl.c   | 21 -
 man/man8/ip-link.8.in |  5 +++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index cf3a9ef..d24e737 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -50,6 +50,8 @@ static void print_usage(FILE *f, int sit)
if (sit) {
fprintf(f, "  [ mode { ip6ip | ipip | mplsip | any } 
]\n");
fprintf(f, "  [ isatap ]\n");
+   } else {
+   fprintf(f, "  [ mode { ipip | mplsip | any } ]\n");
}
fprintf(f, "[ external ]\n");
fprintf(f, "[ fwmark MARK ]\n");
@@ -251,6 +253,21 @@ get_failed:
proto = 0;
else
invarg("Cannot guess tunnel mode.", *argv);
+   } else if (strcmp(lu->id, "ipip") == 0 &&
+  strcmp(*argv, "mode") == 0) {
+   NEXT_ARG();
+   if (strcmp(*argv, "ipv4/ipv4") == 0 ||
+strcmp(*argv, "ipip") == 0 ||
+strcmp(*argv, "ip4ip4") == 0)
+   proto = IPPROTO_IPIP;
+   else if (strcmp(*argv, "mpls/ipv4") == 0 ||
+  strcmp(*argv, "mplsip") == 0)
+   proto = IPPROTO_MPLS;
+   else if (strcmp(*argv, "any/ipv4") == 0 ||
+strcmp(*argv, "any") == 0)
+   proto = 0;
+   else
+   invarg("Cannot guess tunnel mode.", *argv);
} else if (strcmp(*argv, "noencap") == 0) {
encaptype = TUNNEL_ENCAP_NONE;
} else if (strcmp(*argv, "encap") == 0) {
@@ -343,9 +360,11 @@ get_failed:
addattr16(n, 1024, IFLA_IPTUN_ENCAP_SPORT, htons(encapsport));
addattr16(n, 1024, IFLA_IPTUN_ENCAP_DPORT, htons(encapdport));
 
+   if (strcmp(lu->id, "ipip") == 0 || strcmp(lu->id, "sit") == 0)
+   addattr8(n, 1024, IFLA_IPTUN_PROTO, proto);
+
if (strcmp(lu->id, "sit") == 0) {
addattr16(n, 1024, IFLA_IPTUN_FLAGS, iflags);
-   addattr8(n, 1024, IFLA_IPTUN_PROTO, proto);
if (ip6rdprefixlen) {
addattr_l(n, 1024, IFLA_IPTUN_6RD_PREFIX,
  , sizeof(ip6rdprefix));
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 994b539..a782712 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -703,8 +703,9 @@ applicable for Generic UDP Encapsulation.
 .BI mode " { ip6ip | ipip | mplsip | any } "
 - specifies mode in which device should run. "ip6ip" indicates
 IPv6-Over-IPv4, "ipip" indicates "IPv4-Over-IPv4", "mplsip" indicates
-MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Only
-supported for SIT where the default is "ip6ip".
+MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Supported for
+SIT where the default is "ip6ip" and IPIP where the default is "ipip".
+IPv6-Over-IPv4 is not supported for IPIP.
 
 .in -8
 
-- 
2.7.4



[PATCH] net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272

2017-06-09 Thread Fabio Estevam
From: Fabio Estevam 

Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on imx51")
introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
for the CONFIG_M5272=y case, causing build failure for the
m5272c3_defconfig.

Add the missing empty stub to fix the build failure.

Reported-by: Paul Gortmaker 
Signed-off-by: Fabio Estevam 
---
 drivers/net/ethernet/freescale/fec_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 297fd19..a6e323f 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2379,6 +2379,10 @@ static void fec_enet_clear_ethtool_stats(struct 
net_device *dev)
 static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
 }
+
+static inline void fec_enet_clear_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 /* ITR clock source is enet system clock (clk_ahb).
-- 
2.7.4



[PATCH iproute/master 0/3] lost mpls ip tunnel patches

2017-06-09 Thread Krister Johansen
Hi Stephen,
I'm a bit unsure of the decorum in this particular situation. Kernel
support for mpls/ip tunnels was integrated back in July of 2016.  At the
time, the author of that feature sent out a RFC patch for the iproute
support but never followed up on subsequent code review comments.

The kernel support got merged, but the iproute support never made it in.
I wanted to run some tests with these features.  In the process, I
tracked down the author's original patches, merged them into current
iproute, and attempted to address the comments from code reviewers.

I've attached an 'Original-Author' label to each commit, and have CC'd
him and the code reviewer on this patch.  If any part of this is
improper, please let me know and I'll respin accordingly.  Mostly, I
wanted to close the loop here so the mpls in ip tunnel support is usable
through iproute.

The original threads for the 2016 patch are here:

http://marc.info/?l=linux-netdev=146782946216005=2
http://marc.info/?l=linux-netdev=146782941615977=2
http://marc.info/?l=linux-netdev=146782947016007=2
http://marc.info/?l=linux-netdev=146782942915988=2

Thanks,

-K

Krister Johansen (3):
  iptunnel: document mode parameter for sit tunnels
  iptunnel: add support for mpls/ip to sit tunnels
  iptunnel: add support for mpls/ip to ipip tunnels

 include/utils.h   |  3 +++
 ip/link_iptnl.c   | 30 ++
 ip/tunnel.c   |  3 +++
 man/man8/ip-link.8.in | 12 +++-
 4 files changed, 43 insertions(+), 5 deletions(-)

-- 
2.7.4



[PATCH iproute/master 2/3] iptunnel: add support for mpls/ip to sit tunnels

2017-06-09 Thread Krister Johansen
Original-Author: Simon Horman 
Signed-off-by: Krister Johansen 
---
 include/utils.h   | 3 +++
 ip/link_iptnl.c   | 9 ++---
 ip/tunnel.c   | 3 +++
 man/man8/ip-link.8.in | 9 +
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index bfbc9e6..60ffde4 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -87,6 +87,9 @@ struct ipx_addr {
 #ifndef AF_MPLS
 # define AF_MPLS 28
 #endif
+#ifndef IPPROTO_MPLS
+#define IPPROTO_MPLS   137
+#endif
 
 __u32 get_addr32(const char *name);
 int get_addr_1(inet_prefix *dst, const char *arg, int family);
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 2f74d9b..cf3a9ef 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include "rt_names.h"
@@ -47,9 +48,8 @@ static void print_usage(FILE *f, int sit)
type
);
if (sit) {
-   fprintf(f,
-   "[ mode { ip6ip | ipip | any } ]\n"
-   "[ isatap ]\n");
+   fprintf(f, "  [ mode { ip6ip | ipip | mplsip | any } 
]\n");
+   fprintf(f, "  [ isatap ]\n");
}
fprintf(f, "[ external ]\n");
fprintf(f, "[ fwmark MARK ]\n");
@@ -243,6 +243,9 @@ get_failed:
 strcmp(*argv, "ipip") == 0 ||
 strcmp(*argv, "ip4ip4") == 0)
proto = IPPROTO_IPIP;
+   else if (strcmp(*argv, "mpls/ipv4") == 0 ||
+  strcmp(*argv, "mplsip") == 0)
+   proto = IPPROTO_MPLS;
else if (strcmp(*argv, "any/ipv4") == 0 ||
 strcmp(*argv, "any") == 0)
proto = 0;
diff --git a/ip/tunnel.c b/ip/tunnel.c
index 7956d71..d359eb9 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -54,6 +54,9 @@ const char *tnl_strproto(__u8 proto)
case IPPROTO_ESP:
strcpy(buf, "esp");
break;
+   case IPPROTO_MPLS:
+   strcpy(buf, "mpls");
+   break;
case 0:
strcpy(buf, "any");
break;
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 3cc2f5d..994b539 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -662,7 +662,7 @@ the following additional arguments are supported:
 ] [
 .I " [no]encap-remcsum "
 ] [
-.I " mode " { ip6ip | ipip | any } "
+.I " mode " { ip6ip | ipip | mplsip | any } "
 ]
 
 .in +8
@@ -700,10 +700,11 @@ encapsulation.
 applicable for Generic UDP Encapsulation.
 
 .sp
-.BI mode " { ip6ip | ipip | any } "
+.BI mode " { ip6ip | ipip | mplsip | any } "
 - specifies mode in which device should run. "ip6ip" indicates
-IPv6-Over-IPv4, "ipip" indicates "IPv4-Over-IPv4", "any" indicates either
-IPv6 or IPv4 Over IPv4. Only supported for SIT where the default is "ip6ip".
+IPv6-Over-IPv4, "ipip" indicates "IPv4-Over-IPv4", "mplsip" indicates
+MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Only
+supported for SIT where the default is "ip6ip".
 
 .in -8
 
-- 
2.7.4



Re: [PATCH net-next] net: fec: Clear and enable MIB counters on imx51

2017-06-09 Thread Fabio Estevam
Hi Paul,

On Fri, Jun 9, 2017 at 10:01 PM, Paul Gortmaker
 wrote:

> Seems to break one of the automated linux-next builds:
>
> http://kisskb.ellerman.id.au/kisskb/buildresult/13057702/
>
> A mindless automated bisect reports:
>
> 2b30842b23b9e6796c7bd5f0916fd2ebf6b7d633 is the first bad commit
> commit 2b30842b23b9e6796c7bd5f0916fd2ebf6b7d633
> Author: Andrew Lunn 
> Date:   Wed Jun 7 03:57:09 2017 +0200
>
> net: fec: Clear and enable MIB counters on imx51

This should fix it:

--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2379,6 +2379,10 @@ static void fec_enet_clear_ethtool_stats(struct
net_device *dev)
 static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
 }
+
+static inline void fec_enet_clear_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */

 /* ITR clock source is enet system clock (clk_ahb).

Will test it and submit a formal patch in case it works.

Thanks


Re: [PATCH net-next] net: fec: Clear and enable MIB counters on imx51

2017-06-09 Thread Paul Gortmaker
On Wed, Jun 7, 2017 at 10:07 AM, David Miller  wrote:
> From: Andrew Lunn 
> Date: Wed,  7 Jun 2017 03:57:09 +0200
>
>> Both the IMX51 and IMX53 datasheet indicates that the MIB counters
>> should be cleared during setup. Otherwise random numbers are returned
>> via ethtool -S.  Add a quirk and a function to do this.

Seems to break one of the automated linux-next builds:

http://kisskb.ellerman.id.au/kisskb/buildresult/13057702/

A mindless automated bisect reports:

2b30842b23b9e6796c7bd5f0916fd2ebf6b7d633 is the first bad commit
commit 2b30842b23b9e6796c7bd5f0916fd2ebf6b7d633
Author: Andrew Lunn 
Date:   Wed Jun 7 03:57:09 2017 +0200

net: fec: Clear and enable MIB counters on imx51

Paul.
--

>>
>> Tested on an IMX51.
>>
>> Signed-off-by: Andrew Lunn 
>
> Applied, thanks.


[net:master 31/33] net//core/dev.c:8252:2: warning: 'remsd' is used uninitialized in this function

2017-06-09 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   f6d4c7133257bb2d6f66723d11b19f1c49cdf2f7
commit: 773fc8f6e8d63ec9d840588e161cbb73a01cfc45 [31/33] net: rps: send out 
pending IPI's on CPU hotplug
config: blackfin-allyesconfig (attached as .config)
compiler: bfin-uclinux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 773fc8f6e8d63ec9d840588e161cbb73a01cfc45
# save the attached .config to linux build tree
make.cross ARCH=blackfin 

All warnings (new ones prefixed by >>):

   net//core/dev.c: In function 'dev_cpu_dead':
>> net//core/dev.c:8252:2: warning: 'remsd' is used uninitialized in this 
>> function [-Wuninitialized]
 net_rps_send_ipi(remsd);
 ^~~

vim +/remsd +8252 net//core/dev.c

  8236  
  8237  list_del_init(>poll_list);
  8238  if (napi->poll == process_backlog)
  8239  napi->state = 0;
  8240  else
  8241  napi_schedule(sd, napi);
  8242  }
  8243  
  8244  raise_softirq_irqoff(NET_TX_SOFTIRQ);
  8245  local_irq_enable();
  8246  
  8247  #ifdef CONFIG_RPS
  8248  remsd = oldsd->rps_ipi_list;
  8249  oldsd->rps_ipi_list = NULL;
  8250  #endif
  8251  /* send out pending IPI's on offline CPU */
> 8252  net_rps_send_ipi(remsd);
  8253  
  8254  /* Process offline CPU's input_pkt_queue */
  8255  while ((skb = __skb_dequeue(>process_queue))) {
  8256  netif_rx_ni(skb);
  8257  input_queue_head_incr(oldsd);
  8258  }
  8259  while ((skb = skb_dequeue(>input_pkt_queue))) {
  8260  netif_rx_ni(skb);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] l2tp: cast l2tp traffic counter to unsigned

2017-06-09 Thread Eric Dumazet
On Fri, 2017-06-09 at 15:16 -0700, Stephen Hemminger wrote:
> On Fri,  9 Jun 2017 16:29:47 +0200
> Dominik Heidler  wrote:
> 
> > This fixes a counter problem on 32bit systems:
> > When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) 
> > Bytes.
> > 
> > rtnl_link_stats64 has __u64 type and atomic_long_read returns
> > atomic_long_t which is signed. Due to the conversation
> > we get an incorrect value on 32bit systems if the MSB of
> > the atomic_long_t value is set.
> > 
> > CC: Tom Parkin 
> > Fixes: 7b7c0719cd7a ("l2tp: avoid deadlock in l2tp stats update")
> > Signed-off-by: Dominik Heidler 
> > ---
> >  net/l2tp/l2tp_eth.c | 13 +++--
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> > index 8b21af7321b9..668a75e002e9 100644
> > --- a/net/l2tp/l2tp_eth.c
> > +++ b/net/l2tp/l2tp_eth.c
> > @@ -114,12 +114,13 @@ static void l2tp_eth_get_stats64(struct net_device 
> > *dev,
> >  {
> > struct l2tp_eth *priv = netdev_priv(dev);
> >  
> > -   stats->tx_bytes   = atomic_long_read(>tx_bytes);
> > -   stats->tx_packets = atomic_long_read(>tx_packets);
> > -   stats->tx_dropped = atomic_long_read(>tx_dropped);
> > -   stats->rx_bytes   = atomic_long_read(>rx_bytes);
> > -   stats->rx_packets = atomic_long_read(>rx_packets);
> > -   stats->rx_errors  = atomic_long_read(>rx_errors);
> > +   stats->tx_bytes   = (unsigned long) atomic_long_read(>tx_bytes);
> > +   stats->tx_packets = (unsigned long) atomic_long_read(>tx_packets);
> > +   stats->tx_dropped = (unsigned long) atomic_long_read(>tx_dropped);
> > +   stats->rx_bytes   = (unsigned long) atomic_long_read(>rx_bytes);
> > +   stats->rx_packets = (unsigned long) atomic_long_read(>rx_packets);
> > +   stats->rx_errors  = (unsigned long) atomic_long_read(>rx_errors);
> > +
> >  }
> >  
> >  static const struct net_device_ops l2tp_eth_netdev_ops = {
> 
> This is not the right way to fix this.
> 
> 1. shouldn't be using atomic's for network counters, look at other network 
> devices.
> 
> 2. should be using u64_stats_fetch  api to handle 64 bit counters.

But they do not want 64bit counters, and not per cpu counters for a
driver handling few packets per second.

Just use native size of "unsigned long".

We use the same atomic_long_t for (struct netdev)->rx_dropped,
tx_dropped & rx_nohandler

So I guess same fix is needed in dev_get_stats()

diff --git a/net/core/dev.c b/net/core/dev.c
index 
54bb8d99d26afcc1a9c5a56f1e8c2d1f6e06db98..1a66a8761f9a579c9bf6b6ab5b1415770adcf76b
 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7785,9 +7785,9 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device 
*dev,
} else {
netdev_stats_to_stats64(storage, >stats);
}
-   storage->rx_dropped += atomic_long_read(>rx_dropped);
-   storage->tx_dropped += atomic_long_read(>tx_dropped);
-   storage->rx_nohandler += atomic_long_read(>rx_nohandler);
+   storage->rx_dropped += (unsigned 
long)atomic_long_read(>rx_dropped);
+   storage->tx_dropped += (unsigned 
long)atomic_long_read(>tx_dropped);
+   storage->rx_nohandler += (unsigned 
long)atomic_long_read(>rx_nohandler);
return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);




Re: [Regression, 4.12-rc1] Address family not supported by protocol

2017-06-09 Thread Randy Dunlap
[adding netdev]

Hi Paul,
Did you get anywhere with this?

The only difference that I see in the kernel config files is

4.12-rc1 says:
# CONFIG_NET_SCH_DEFAULT is not set

and 4.11 does not have that kconfig option.


On 05/15/17 05:53, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> When building Linux 4.12-rc1 with the configuration from Linux 4.11, then 
> many user space programs show the error `Address family not supported by 
> protocol`.
> 
> Please find the configuration, and the Linux kernel messages attached.
> 
> [5.383630] systemd[1]: Failed to insert module 'autofs4': No such file or 
> directory
> [5.383808] systemd[1]: Failed to insert module 'unix': No such file or 
> directory
> [5.600711] systemd[1]: systemd 231 running in system mode. (+PAM -AUDIT 
> -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT +GNUTLS 
> -ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
> [5.601028] systemd[1]: Detected architecture x86-64.
> [5.623700] systemd[1]: Set hostname to .
> [5.631835] systemd[1]: Failed to read AF_UNIX datagram queue length, 
> ignoring: No such file or directory
> [6.404973] systemd[1]: Failed to allocate notification socket: Address 
> family not supported by protocol
> [6.405441] systemd[1]: Failed to allocate cgroups agent socket: Address 
> family not supported by protocol
> [6.405817] systemd[1]: Failed to allocate private socket: Address family 
> not supported by protocol
> [6.406218] systemd[1]: socket() failed: Address family not supported by 
> protocol
> [6.406379] systemd[1]: Failed to fully start up daemon: Address family 
> not supported by protocol
> [6.515759] systemd[1]: Failed to listen on udev Control Socket.
> [8.597362] systemd-udevd[283]: error getting socket: Address family not 
> supported by protocol
> Any help, where to report this issue to, is welcome.
> 
> 
> Kind regards,
> 
> Paul


-- 
~Randy


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Chenbo Feng



On 06/09/2017 12:39 PM, David Miller wrote:

From: Chenbo Feng 
Date: Fri, 9 Jun 2017 12:13:57 -0700



On 06/09/2017 12:08 PM, David Miller wrote:

From: Chenbo Feng 
Date: Fri,  9 Jun 2017 12:06:07 -0700


From: Chenbo Feng 

Move the initialization of skb->dev and skb->protocol from
ip6_finish_output2 to ip6_output. This can make the skb->dev and
skb->protocol information avalaible to the CGROUP eBPF filter.

Signed-off-by: Chenbo Feng 
Acked-by: Eric Dumazet 

Applied, thanks.

This makes ipv6 consistent with ipv4.

I am surprised this wasn't noticed, for example, in netfilter.
.


Hi David,

This patch is still under working since it may have problem with
ip_fragment() call, did you applied it already? Should I send a revert
patch to you then?

A revert is necessary or a relative fixup.

Thank you.


Hi David,

The revert is uploaded here: http://patchwork.ozlabs.org/patch/774136/

Thanks and sorry for the trouble caused

Chenbo Feng


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-09 Thread Doug Ledford
On Wed, 2017-06-07 at 13:21 -0600, Jason Gunthorpe wrote:
> On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote:
> > 
> > No !!
> > I am just showing you that the ib_core eventually will end up
> > calling
> > mlx5_core to create a QP.
> > so mlx5_core can create the QP it self since it is the one
> > eventually
> > creating QPs.
> > we just call mlx5_core_create_qp directly.
> 
> Which is building a RDMA ULP inside a driver without using the core
> code :(

Aren't the transmit/receive queues of the Ethernet netdevice on
mlx4/mlx5 hardware QPs too?  Those bypass the RDMA subsystem entirely.
 Just because something uses a QP on hardware that does *everything*
via QPs doesn't necessarily mean it must go through the RDMA subsystem.

Now, the fact that the content of the packets is basically a RoCE
packet does make things a bit fuzzier, but if their packets are
specially crafted RoCE packets that aren't really intended to be fully
RoCE spec compliant (maybe they don't support all the options as normal
RoCE QPs), then I can see hiding them from the larger RoCE portion of
the RDMA stack.

> > 
> > > 
> > > This keep getting more ugly :(
> > > 
> > > What about security? What if user space sends some raw packets to
> > > the
> > > FPGA - can it reprogram the ISPEC settings or worse?
> > > 
> > 
> > No such thing. This QP is only for internal driver/HW
> > communications,
> > as it is faster from the existing command interface.
> > it is not meant to be exposed for any raw user space usages at all,
> > without proper standard API adapter of course.
> 
> I'm not asking about the QP, I'm asking what happens after the NIC
> part. You use ROCE packets to control the FPGA. What prevents
> userspace from forcibly constructing roce packets and sending them to
> the FPGA. How does the FPGA know for certain the packet came from the
> kernel QP and not someplace else.

This is a valid concern.

> This is especially true for mlx nics as there are many raw packet
> bypass mechanisms available to userspace.

Right.  The question becomes: Does the firmware filter outgoing raw ETH
QPs such that a nefarious user could not send a crafted RoCE packet
that the bump on the wire would intercept and accept?

-- 
Doug Ledford 
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-09 Thread Florian Fainelli
On 06/09/2017 03:13 PM, neta...@amazon.com wrote:
> From: Netanel Belgazal 
> 
> This patchset contains fixes for the bugs that were discovered so far.

If these are all fixes you should submit them against the "net" tree.
net-next is for features [1].

Since these are fixes, you may also want to provide a Fixes: 12-digit
commit ("commit subject") [2] such that David can queue these patches
for stable trees and this can be retrofitted into kernel distributions.

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n25

[2]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n183

> 
> Netanel Belgazal (8):
>   net: ena: fix rare uncompleted admin command false alarm
>   net: ena: fix bug that might cause hang after consecutive open/close
> interface.
>   net: ena: add missing return when ena_com_get_io_handlers() fails
>   net: ena: fix race condition between submit and completion admin
> command
>   net: ena: add missing unmap bars on device removal
>   net: ena: fix theoretical Rx hang on low memory systems
>   net: ena: disable admin msix while working in polling mode
>   net: ena: bug fix in lost tx packets detection mechanism
> 
>  drivers/net/ethernet/amazon/ena/ena_com.c |  35 +++--
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c |   2 +-
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 179 
> +++---
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  16 ++-
>  4 files changed, 168 insertions(+), 64 deletions(-)
> 


-- 
Florian


Re: [PATCH] l2tp: cast l2tp traffic counter to unsigned

2017-06-09 Thread Stephen Hemminger
On Fri,  9 Jun 2017 16:29:47 +0200
Dominik Heidler  wrote:

> This fixes a counter problem on 32bit systems:
> When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) 
> Bytes.
> 
> rtnl_link_stats64 has __u64 type and atomic_long_read returns
> atomic_long_t which is signed. Due to the conversation
> we get an incorrect value on 32bit systems if the MSB of
> the atomic_long_t value is set.
> 
> CC: Tom Parkin 
> Fixes: 7b7c0719cd7a ("l2tp: avoid deadlock in l2tp stats update")
> Signed-off-by: Dominik Heidler 
> ---
>  net/l2tp/l2tp_eth.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index 8b21af7321b9..668a75e002e9 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -114,12 +114,13 @@ static void l2tp_eth_get_stats64(struct net_device *dev,
>  {
>   struct l2tp_eth *priv = netdev_priv(dev);
>  
> - stats->tx_bytes   = atomic_long_read(>tx_bytes);
> - stats->tx_packets = atomic_long_read(>tx_packets);
> - stats->tx_dropped = atomic_long_read(>tx_dropped);
> - stats->rx_bytes   = atomic_long_read(>rx_bytes);
> - stats->rx_packets = atomic_long_read(>rx_packets);
> - stats->rx_errors  = atomic_long_read(>rx_errors);
> + stats->tx_bytes   = (unsigned long) atomic_long_read(>tx_bytes);
> + stats->tx_packets = (unsigned long) atomic_long_read(>tx_packets);
> + stats->tx_dropped = (unsigned long) atomic_long_read(>tx_dropped);
> + stats->rx_bytes   = (unsigned long) atomic_long_read(>rx_bytes);
> + stats->rx_packets = (unsigned long) atomic_long_read(>rx_packets);
> + stats->rx_errors  = (unsigned long) atomic_long_read(>rx_errors);
> +
>  }
>  
>  static const struct net_device_ops l2tp_eth_netdev_ops = {

This is not the right way to fix this.

1. shouldn't be using atomic's for network counters, look at other network 
devices.

2. should be using u64_stats_fetch  api to handle 64 bit counters.


[PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command

2017-06-09 Thread netanel
From: Netanel Belgazal 

Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).

Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.

This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.

Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index e1c2fab..ea60b9e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -232,11 +232,9 @@ static struct ena_comp_ctx 
*__ena_com_submit_admin_cmd(struct ena_com_admin_queu
tail_masked = admin_queue->sq.tail & queue_size_mask;
 
/* In case of queue FULL */
-   cnt = admin_queue->sq.tail - admin_queue->sq.head;
+   cnt = atomic_read(_queue->outstanding_cmds);
if (cnt >= admin_queue->q_depth) {
-   pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n",
-admin_queue->sq.tail, admin_queue->sq.head,
-admin_queue->q_depth);
+   pr_debug("admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(-ENOSPC);
}
-- 
2.7.4



[PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems

2017-06-09 Thread netanel
From: Netanel Belgazal 

For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel  memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more

The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 55 +++
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  2 +
 3 files changed, 58 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 67b2338f..533b2fb 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -94,6 +94,7 @@ static const struct ena_stats ena_stats_rx_strings[] = {
ENA_STAT_RX_ENTRY(dma_mapping_err),
ENA_STAT_RX_ENTRY(bad_desc_num),
ENA_STAT_RX_ENTRY(rx_copybreak_pkt),
+   ENA_STAT_RX_ENTRY(empty_rx_ring),
 };
 
 static const struct ena_stats ena_stats_ena_com_strings[] = {
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 4e9fbdd..3c366bf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -190,6 +190,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
rxr->sgl_size = adapter->max_rx_sgl_size;
rxr->smoothed_interval =
ena_com_get_nonadaptive_moderation_interval_rx(ena_dev);
+   rxr->empty_rx_queue = 0;
}
 }
 
@@ -2619,6 +2620,58 @@ static void check_for_missing_tx_completions(struct 
ena_adapter *adapter)
adapter->last_monitored_tx_qid = i % adapter->num_queues;
 }
 
+/* trigger napi schedule after 2 consecutive detections */
+#define EMPTY_RX_REFILL 2
+/* For the rare case where the device runs out of Rx descriptors and the
+ * napi handler failed to refill new Rx descriptors (due to a lack of memory
+ * for example).
+ * This case will lead to a deadlock:
+ * The device won't send interrupts since all the new Rx packets will be 
dropped
+ * The napi handler won't allocate new Rx descriptors so the device will be
+ * able to send new packets.
+ *
+ * This scenario can happen when the kernel's vm.min_free_kbytes is too small.
+ * It is recommended to have at least 512MB, with a minimum of 128MB for
+ * constrained environment).
+ *
+ * When such a situation is detected - Reschedule napi
+ */
+static void check_for_empty_rx_ring(struct ena_adapter *adapter)
+{
+   struct ena_ring *rx_ring;
+   int i, refill_required;
+
+   if (!test_bit(ENA_FLAG_DEV_UP, >flags))
+   return;
+
+   if (test_bit(ENA_FLAG_TRIGGER_RESET, >flags))
+   return;
+
+   for (i = 0; i < adapter->num_queues; i++) {
+   rx_ring = >rx_ring[i];
+
+   refill_required =
+   ena_com_sq_empty_space(rx_ring->ena_com_io_sq);
+   if (unlikely(refill_required == (rx_ring->ring_size - 1))) {
+   rx_ring->empty_rx_queue++;
+
+   if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) {
+   u64_stats_update_begin(_ring->syncp);
+   rx_ring->rx_stats.empty_rx_ring++;
+   u64_stats_update_end(_ring->syncp);
+
+   netif_err(adapter, drv, adapter->netdev,
+ "trigger refill for ring %d\n", i);
+
+   napi_schedule(rx_ring->napi);
+   rx_ring->empty_rx_queue = 0;
+   }
+   } else {
+   rx_ring->empty_rx_queue = 0;
+   }
+   }
+}
+
 /* Check for keep alive expiration */
 static void check_for_missing_keep_alive(struct ena_adapter *adapter)
 {
@@ -2673,6 +2726,8 @@ static void ena_timer_service(unsigned long data)
 
check_for_missing_tx_completions(adapter);
 
+   check_for_empty_rx_ring(adapter);
+
if (debug_area)
ena_dump_stats_to_buf(adapter, debug_area);
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 0e22bce..8828f1d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -184,6 +184,7 @@ struct ena_stats_rx {
u64 dma_mapping_err;
u64 bad_desc_num;
u64 

[PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism

2017-06-09 Thread netanel
From: Netanel Belgazal 

check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  1 -
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 66 +++
 drivers/net/ethernet/amazon/ena/ena_netdev.h  | 14 +-
 3 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 533b2fb..3ee55e2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(tx_poll),
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(prepare_ctx_err),
-   ENA_STAT_TX_ENTRY(missing_tx_comp),
ENA_STAT_TX_ENTRY(bad_req_id),
 };
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 3c366bf..4f16ed3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
tx_info->tx_descs = nb_hw_desc;
tx_info->last_jiffies = jiffies;
+   tx_info->print_once = 0;
 
tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use,
tx_ring->ring_size);
@@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct 
*work)
"Reset attempt failed. Can not reset the device\n");
 }
 
-static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+static int check_missing_comp_in_queue(struct ena_adapter *adapter,
+  struct ena_ring *tx_ring)
 {
struct ena_tx_buffer *tx_buf;
unsigned long last_jiffies;
+   u32 missed_tx = 0;
+   int i;
+
+   for (i = 0; i < tx_ring->ring_size; i++) {
+   tx_buf = _ring->tx_buffer_info[i];
+   last_jiffies = tx_buf->last_jiffies;
+   if (unlikely(last_jiffies &&
+time_is_before_jiffies(last_jiffies + 
TX_TIMEOUT))) {
+   if (!tx_buf->print_once)
+   netif_notice(adapter, tx_err, adapter->netdev,
+"Found a Tx that wasn't completed 
on time, qid %d, index %d.\n",
+tx_ring->qid, i);
+
+   tx_buf->print_once = 1;
+   missed_tx++;
+
+   if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) 
{
+   netif_err(adapter, tx_err, adapter->netdev,
+ "The number of lost tx completions is 
above the threshold (%d > %d). Reset the device\n",
+ missed_tx, 
MAX_NUM_OF_TIMEOUTED_PACKETS);
+   set_bit(ENA_FLAG_TRIGGER_RESET, 
>flags);
+   return -EIO;
+   }
+   }
+   }
+
+   return 0;
+}
+
+static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+{
struct ena_ring *tx_ring;
-   int i, j, budget;
-   u32 missed_tx;
+   int i, budget, rc;
 
/* Make sure the driver doesn't turn the device in other process */
smp_rmb();
@@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct 
ena_adapter *adapter)
for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) {
tx_ring = >tx_ring[i];
 
-   for (j = 0; j < tx_ring->ring_size; j++) {
-   tx_buf = _ring->tx_buffer_info[j];
-   last_jiffies = tx_buf->last_jiffies;
-   if (unlikely(last_jiffies && 
time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
-   netif_notice(adapter, tx_err, adapter->netdev,
-"Found a Tx that wasn't completed 
on time, qid %d, index %d.\n",
-tx_ring->qid, j);
-
-   u64_stats_update_begin(_ring->syncp);
-   missed_tx = tx_ring->tx_stats.missing_tx_comp++;
-   u64_stats_update_end(_ring->syncp);
-
-   /* Clear last jiffies so the lost buffer won't
-* be counted twice.
-*/
-   tx_buf->last_jiffies = 0;
-
-   if 

[PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode

2017-06-09 Thread netanel
From: Netanel Belgazal 

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index ea60b9e..f5b237e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -61,6 +61,8 @@
 
 #define ENA_MMIO_READ_TIMEOUT 0x
 
+#define ENA_REGS_ADMIN_INTR_MASK 1
+
 /*/
 /*/
 /*/
@@ -1454,6 +1456,12 @@ void ena_com_admin_destroy(struct ena_com_dev *ena_dev)
 
 void ena_com_set_admin_polling_mode(struct ena_com_dev *ena_dev, bool polling)
 {
+   u32 mask_value = 0;
+
+   if (polling)
+   mask_value = ENA_REGS_ADMIN_INTR_MASK;
+
+   writel(mask_value, ena_dev->reg_bar + ENA_REGS_INTR_MASK_OFF);
ena_dev->admin_queue.polling = polling;
 }
 
-- 
2.7.4



[PATCH net-next 5/8] net: ena: add missing unmap bars on device removal

2017-06-09 Thread netanel
From: Netanel Belgazal 

This patch also change the mapping functions to devm_ functions

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 1e71e89..4e9fbdd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2853,6 +2853,11 @@ static void ena_release_bars(struct ena_com_dev 
*ena_dev, struct pci_dev *pdev)
 {
int release_bars;
 
+   if (ena_dev->mem_bar)
+   devm_iounmap(>dev, ena_dev->mem_bar);
+
+   devm_iounmap(>dev, ena_dev->reg_bar);
+
release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK;
pci_release_selected_regions(pdev, release_bars);
 }
@@ -2940,8 +2945,9 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
goto err_free_ena_dev;
}
 
-   ena_dev->reg_bar = ioremap(pci_resource_start(pdev, ENA_REG_BAR),
-  pci_resource_len(pdev, ENA_REG_BAR));
+   ena_dev->reg_bar = devm_ioremap(>dev,
+   pci_resource_start(pdev, ENA_REG_BAR),
+   pci_resource_len(pdev, ENA_REG_BAR));
if (!ena_dev->reg_bar) {
dev_err(>dev, "failed to remap regs bar\n");
rc = -EFAULT;
@@ -2961,8 +2967,9 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
ena_set_push_mode(pdev, ena_dev, _feat_ctx);
 
if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) {
-   ena_dev->mem_bar = ioremap_wc(pci_resource_start(pdev, 
ENA_MEM_BAR),
- pci_resource_len(pdev, 
ENA_MEM_BAR));
+   ena_dev->mem_bar = devm_ioremap_wc(>dev,
+  pci_resource_start(pdev, 
ENA_MEM_BAR),
+  pci_resource_len(pdev, 
ENA_MEM_BAR));
if (!ena_dev->mem_bar) {
rc = -EFAULT;
goto err_device_destroy;
-- 
2.7.4



[PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails

2017-06-09 Thread netanel
From: Netanel Belgazal 

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0e3c60c7..1e71e89 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1543,6 +1543,7 @@ static int ena_create_io_tx_queue(struct ena_adapter 
*adapter, int qid)
  "Failed to get TX queue handlers. TX queue num %d rc: 
%d\n",
  qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+   return rc;
}
 
ena_com_update_numa_node(tx_ring->ena_com_io_cq, ctx.numa_node);
@@ -1607,6 +1608,7 @@ static int ena_create_io_rx_queue(struct ena_adapter 
*adapter, int qid)
  "Failed to get RX queue handlers. RX queue num %d rc: 
%d\n",
  qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+   return rc;
}
 
ena_com_update_numa_node(rx_ring->ena_com_io_cq, ctx.numa_node);
-- 
2.7.4



[PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-09 Thread netanel
From: Netanel Belgazal 

This patchset contains fixes for the bugs that were discovered so far.

Netanel Belgazal (8):
  net: ena: fix rare uncompleted admin command false alarm
  net: ena: fix bug that might cause hang after consecutive open/close
interface.
  net: ena: add missing return when ena_com_get_io_handlers() fails
  net: ena: fix race condition between submit and completion admin
command
  net: ena: add missing unmap bars on device removal
  net: ena: fix theoretical Rx hang on low memory systems
  net: ena: disable admin msix while working in polling mode
  net: ena: bug fix in lost tx packets detection mechanism

 drivers/net/ethernet/amazon/ena/ena_com.c |  35 +++--
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |   2 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 179 +++---
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  16 ++-
 4 files changed, 168 insertions(+), 64 deletions(-)

-- 
2.7.4



Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-09 Thread Belgazal, Netanel
I the last minute I fixed patchset #6 commit subject from stuck to hang and I 
forget to remove it.
Sorry for that.
resubmitted.

From: David Miller 
Sent: Friday, June 9, 2017 10:33 PM
To: Belgazal, Netanel
Cc: netdev@vger.kernel.org; Woodhouse, David; Machulsky, Zorik; Matushevsky, 
Alexander; BSHARA, Said; Wilson, Matt; Liguori, Anthony; Bshara, Nafea; 
Schmeilin, Evgeny
Subject: Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

From: 
Date: Fri, 9 Jun 2017 09:55:16 +0300

> This patchset contains fixes for the bugs that were discovered so far.

You submitted patch #6 twice, once with the word "stuck" in the subject
line, once with the word "hang" in the subject line.

Please sort this out and resubmit, thanks.



[PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm

2017-06-09 Thread netanel
From: Netanel Belgazal 

The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error

check_for_completion()
sleep()
   }
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.

The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index 08d11ce..e1c2fab 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
 static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx 
*comp_ctx,
 struct ena_com_admin_queue 
*admin_queue)
 {
-   unsigned long flags;
-   u32 start_time;
+   unsigned long flags, timeout;
int ret;
 
-   start_time = ((u32)jiffies_to_usecs(jiffies));
+   timeout = jiffies + ADMIN_CMD_TIMEOUT_US;
+
+   while (1) {
+   spin_lock_irqsave(_queue->q_lock, flags);
+   ena_com_handle_admin_completion(admin_queue);
+   spin_unlock_irqrestore(_queue->q_lock, flags);
 
-   while (comp_ctx->status == ENA_CMD_SUBMITTED) {
-   if u32)jiffies_to_usecs(jiffies)) - start_time) >
-   ADMIN_CMD_TIMEOUT_US) {
+   if (comp_ctx->status != ENA_CMD_SUBMITTED)
+   break;
+
+   if (time_is_before_jiffies(timeout)) {
pr_err("Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
spin_lock_irqsave(_queue->q_lock, flags);
@@ -528,10 +533,6 @@ static int 
ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
goto err;
}
 
-   spin_lock_irqsave(_queue->q_lock, flags);
-   ena_com_handle_admin_completion(admin_queue);
-   spin_unlock_irqrestore(_queue->q_lock, flags);
-
msleep(100);
}
 
-- 
2.7.4



[PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface.

2017-06-09 Thread netanel
From: Netanel Belgazal 

Fixing a bug that the driver does not unmask the IO interrupts
in ndo_open():
occasionally, the MSI-X interrupt (for one or more IO queues)
can be masked when ndo_close() was called.
If that is followed by ndo open(),
then the MSI-X will be still masked so no interrupt
will be received by the driver.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 41 ++--
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 7c1214d..0e3c60c7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1078,6 +1078,26 @@ inline void ena_adjust_intr_moderation(struct ena_ring 
*rx_ring,
rx_ring->per_napi_bytes = 0;
 }
 
+static inline void ena_unmask_interrupt(struct ena_ring *tx_ring,
+   struct ena_ring *rx_ring)
+{
+   struct ena_eth_io_intr_reg intr_reg;
+
+   /* Update intr register: rx intr delay,
+* tx intr delay and interrupt unmask
+*/
+   ena_com_update_intr_reg(_reg,
+   rx_ring->smoothed_interval,
+   tx_ring->smoothed_interval,
+   true);
+
+   /* It is a shared MSI-X.
+* Tx and Rx CQ have pointer to it.
+* So we use one of them to reach the intr reg
+*/
+   ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg);
+}
+
 static inline void ena_update_ring_numa_node(struct ena_ring *tx_ring,
 struct ena_ring *rx_ring)
 {
@@ -1108,7 +1128,6 @@ static int ena_io_poll(struct napi_struct *napi, int 
budget)
 {
struct ena_napi *ena_napi = container_of(napi, struct ena_napi, napi);
struct ena_ring *tx_ring, *rx_ring;
-   struct ena_eth_io_intr_reg intr_reg;
 
u32 tx_work_done;
u32 rx_work_done;
@@ -1149,22 +1168,9 @@ static int ena_io_poll(struct napi_struct *napi, int 
budget)
if 
(ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev))
ena_adjust_intr_moderation(rx_ring, tx_ring);
 
-   /* Update intr register: rx intr delay,
-* tx intr delay and interrupt unmask
-*/
-   ena_com_update_intr_reg(_reg,
-   rx_ring->smoothed_interval,
-   tx_ring->smoothed_interval,
-   true);
-
-   /* It is a shared MSI-X.
-* Tx and Rx CQ have pointer to it.
-* So we use one of them to reach the intr reg
-*/
-   ena_com_unmask_intr(rx_ring->ena_com_io_cq, _reg);
+   ena_unmask_interrupt(tx_ring, rx_ring);
}
 
-
ena_update_ring_numa_node(tx_ring, rx_ring);
 
ret = rx_work_done;
@@ -1485,6 +1491,11 @@ static int ena_up_complete(struct ena_adapter *adapter)
 
ena_napi_enable_all(adapter);
 
+   /* Enable completion queues interrupt */
+   for (i = 0; i < adapter->num_queues; i++)
+   ena_unmask_interrupt(>tx_ring[i],
+>rx_ring[i]);
+
/* schedule napi in case we had pending packets
 * from the last time we disable napi
 */
-- 
2.7.4



[PATCH] net: aquantia: atlantic: remove declaration of hw_atl_utils_hw_set_power

2017-06-09 Thread Philippe Reynes
This function is not defined, so no need to declare it.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index b8e3d88..a66aee5 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -193,9 +193,6 @@ int hw_atl_utils_hw_get_regs(struct aq_hw_s *self,
 struct aq_hw_caps_s *aq_hw_caps,
 u32 *regs_buff);
 
-int hw_atl_utils_hw_get_settings(struct aq_hw_s *self,
-struct ethtool_cmd *cmd);
-
 int hw_atl_utils_hw_set_power(struct aq_hw_s *self,
  unsigned int power_state);
 
-- 
1.7.4.4



Re: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

2017-06-09 Thread Cong Wang
On Fri, Jun 9, 2017 at 11:43 AM, Krister Johansen
 wrote:
> On Fri, Jun 09, 2017 at 11:18:44AM -0700, Cong Wang wrote:
>> On Thu, Jun 8, 2017 at 1:12 PM, Krister Johansen
>>  wrote:
>> > The way this works is that if there's still a reference on the dst entry
>> > at the time we try to free it, it gets placed in the gc list by
>> > __dst_free and the dst_destroy() call is invoked by the gc task once the
>> > refcount is 0.  If the gc task processes a 10th or less of its entries
>> > on a single pass, it inreases the amount of time it waits between gc
>> > intervals.
>> >
>> > Looking at the gc_task intervals, they started at 663ms when we invoked
>> > __dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
>> > 10751, and 14848.  The release that set the refcnt to 0 on our dst entry
>> > occurred after the gc_task was enqueued for 14 second interval so we had
>> > to wait longer than the warning time in wait_allrefs in order for the
>> > dst entry to get free'd and the hold on 'lo' to be released.
>> >
>>
>> I am glad to see you don't have a dst leak here.
>>
>> But from my experience of a similar bug (refcnt wait on lo), this goes
>> infinitely rather than just 14sec, so it looked more like a real leak than
>> just a gc delay. So in your case, this annoying warning eventually
>> disappears, right?
>
> That's correct.  The problem occurs intermittently, and the warnings are
> less frequent than the interval in netdev_wait_allrefs().  At least when
> I observed it, it tended to conincide with our controlplane canary
> issuing an API call that lead to a network namespace teardown on the
> dataplane.

Great! Then the bug I saw is different from this one and it is probably
a dst leak.

Thanks.


Re: [PATCH net] ipv4: igmp: fix a use after free

2017-06-09 Thread Cong Wang
On Fri, Jun 9, 2017 at 11:05 AM, Xin Long  wrote:
> On Sat, Jun 10, 2017 at 1:01 AM, Cong Wang  wrote:
>> This is what I thought in my first response, until I realized
>> it is not pure RCU, otherwise pmc->lock should not be taken
>> in igmpv3_send_cr(). It seems the code is mixing the use
>> of spinlock and RCU.
> rcu lock is for pmc not being freed, and spinlock is for pmc's
> members' modification. is there some rule these two locks
> should be mixed?
>

This is exactly why I said we are mixing RCU and spinlock.

>
>>
>> We need RCU anyway, ip_check_mc_rcu() is the real fast
>> path where we don't take spinlock. I think we will need more
>> work.
> It seems all add_grec() callings needs spinlock, maybe  add_grec
> modifies pmc's member. it's hard to drop spinlock.
>
> from ip_check_mc_rcu you mentioned about, it should be right
> to call ip_mc_clear_src after rcu grace period, like Eric's patch.

Well, more than just that, we need to use proper RCU API for
pmc->sources and you know it is a singly linked list...

I will work on this.


Re: [PATCH net-next 0/3] mlx4 drivers: version update

2017-06-09 Thread Doug Ledford
On Wed, 2017-06-07 at 15:33 -0400, David Miller wrote:
> From: Tariq Toukan 
> Date: Wed,  7 Jun 2017 16:26:12 +0300
> 
> > This patchset contains version updates for the MLX4 drivers:
> > Core, EN, and IB.
> > 
> > Just like we've done in mlx5, we modify the outdated driver
> > version (reported in ethtool for example).
> > This better reflects the current driver state, and removes the
> > redundant date string.
> > We are not going to change this frequently or even use it.
> > 
> > I include the IB patch in this series as it has similar subject
> > and content.
> > It does not cause any kind of conflict with Doug's tree.
> > The rdma mailing list is CCed.
> > Please let me know if I need to submit this differently.
> 
> Ok, series applied.
> 
> Doug, let me know if you want to handle this differently or want
> me to change something.
> 
> Thanks.

You've got it, I'll skip it, it's all good.

-- 
Doug Ledford 
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-09 Thread Eric Dumazet
On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
> On 6/8/17 11:55 PM, Cong Wang wrote:
> > On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear  wrote:
> >>
> >> As far as I can tell, the patch did not help, or at least we still 
> >> reproduce
> >> the
> >> crash easily.
> > 
> > netlink dump is serialized by nlk->cb_mutex so I don't think that
> > patch makes any sense w.r.t race condition.
> 
> From what I can see fn_sernum should be accessed under table lock, so
> when saving and checking it during a walk make sure it the lock is held.
> That has nothing to do with the netlink dump, but the table changing
> during a walk.


Yes, your patch makes total sense, of course.




> 
> 
> >> (gdb) l *(fib6_walk_continue+0x76)
> >> 0x188c6 is in fib6_walk_continue
> >> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
> >> 1588if (fn == w->root)
> >> 1589return 0;
> >> 1590pn = fn->parent;
> >> 1591w->node = pn;
> >> 1592#ifdef CONFIG_IPV6_SUBTREES
> >> 1593if (FIB6_SUBTREE(pn) == fn) {
> > 
> > Apparently fn->parent is NULL here for some reason, but
> > I don't know if that is expected or not. If a simple NULL check
> > is not enough here, we have to trace why it is NULL.
> 
> From my understanding, parent should not be null hence the attempts to
> fix access to table nodes under a lock. ie., figuring out why it is null
> here.




Re: [PATCH net-next] bpf: Remove duplicate tcp_filter hook in ipv6

2017-06-09 Thread Eric Dumazet
On Fri, Jun 9, 2017 at 12:17 PM, Chenbo Feng
 wrote:
> From: Chenbo Feng 
>
> There are two tcp_filter hooks in tcp_ipv6 ingress path currently.
> One is at tcp_v6_rcv and another is in tcp_v6_do_rcv. It seems the
> tcp_filter() call inside tcp_v6_do_rcv is redundent and some packet
> will be filtered twice in this situation. This will cause trouble
> when using eBPF filters to account traffic data.
>
> Signed-off-by: Chenbo Feng 
> Acked-by: Eric Dumazet 
> ---
>  net/ipv6/tcp_ipv6.c | 3 ---
>  1 file changed, 3 deletions(-)

Yes, this is the patch I agreed on ;)

Thanks


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Eric Dumazet
On Fri, Jun 9, 2017 at 12:06 PM, Chenbo Feng
 wrote:
> From: Chenbo Feng 
>
> Move the initialization of skb->dev and skb->protocol from
> ip6_finish_output2 to ip6_output. This can make the skb->dev and
> skb->protocol information avalaible to the CGROUP eBPF filter.
>
> Signed-off-by: Chenbo Feng 
> Acked-by: Eric Dumazet 
> ---

Arg, you mixed my Acked-by for your other patch :/


office for an agent/mediator

2017-06-09 Thread Martin Parrish


VALERO CRUDE USA is looking for an agent, to be our coordinator in 
Europe and Asia,

which you will act as an intermediary between our company and the
final buyer of petroleum products in those regions,Inform us if you are 
interested

Sincerely,
Martin Parrish
(Vice-President Alternate Fuels)


Re: [PATCH v2] sh_eth: add support to change MTU

2017-06-09 Thread Sergei Shtylyov

On 06/09/2017 11:32 PM, Niklas Söderlund wrote:


The hardware supports the MTU to be changed and the driver it self is
somewhat prepared to support this. This patch hooks up the callbacks to
be able to change the MTU from user-space.

Signed-off-by: Niklas Söderlund 
Acked-by: Sergei Shtylyov 
---

Based on v4.12-rc1 and tested on Renesas R-Car Koelsch M2.

Test procedure:

1. On host set MTU to something large (9000) was used for this test.

2. On target set MTU to something other then 1500, in this test the max
   MTU of 1978 is used.

3. Send ping with large payload and observe that it works.

   ping -M do -s 1954 

   The reason for 1954 instead of 1982 is two fold:

   1. On Linux (different on Mac IIRC) the ICMP/ping implementation
  doesn’t encapsulate the 28 byte ICMP (8) + IP (20).
   2. The driver internally reserve 4 bytes of transmission buffer for
  an optional VLAN header (4). And since no VLAN is used in this
  setup the additional 4 bytes can carry data.

4. For extra verification the packet flow is inspected using tcpdump to
   verify that there is no packet fragmentation.

* Changes since v1
- Fix spelling mistake in comment, thanks Sergei!
- Add Acked-by from Sergei.

 drivers/net/ethernet/renesas/sh_eth.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index f68c4db656eda846..9c6e4025bfc9f5c5 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2558,6 +2558,17 @@ static int sh_eth_do_ioctl(struct net_device *ndev, 
struct ifreq *rq, int cmd)
return phy_mii_ioctl(phydev, rq, cmd);
 }

+static int sh_eth_change_mtu(struct net_device *dev, int new_mtu)


   Hmm, wait... The 'struct net_device *' typed variables are consistently 
called 'ndev' throughout the driver. Please rename, sorry for not noticing the 
1st time... :-<


[...]

MBR, Sergei



BUG: Bad page state in process Compositor pfn:c03e2

2017-06-09 Thread Алексей Болдырев
[ 1621.875870] BUG: Bad page state in process Compositor pfn:c03e2
[ 1621.875876] page:ea000300f880 count:-1 mapcount:0 mapping: (null) 
index:0x0
[ 1621.875878] flags: 0x100()
[ 1621.875881] raw: 0100   

[ 1621.875882] raw:  dead0200  

[ 1621.875883] page dumped because: nonzero _count
[ 1621.875884] Modules linked in: vhost_net vhost tap af_packet tun 
ebtable_filter ebtables x_tables dummy nbd bridge 8021q garp mrp stp llc 
ata_generic pata_acpi snd_hda_codec_realtek snd_hda_codec_generic snd_usb_audio 
snd_usbmidi_lib snd_rawmidi snd_seq_device uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media snd_hda_intel 
snd_hda_codec snd_pcsp snd_hda_core snd_hwdep kvm_amd snd_pcm kvm snd_timer snd 
irqbypass e1000 soundcore pata_atiixp nouveau r8169 ttm tpm_infineon mii wmi 
acpi_cpufreq ohci_pci tpm_tis ohci_hcd tpm_tis_core tpm fuse ipv6 crc_ccitt unix
[ 1621.875907] CPU: 0 PID: 3674 Comm: Compositor Not tainted 4.11.3 #2
[ 1621.875907] Hardware name: MSI MS-7715/870-C45(FX) V2 (MS-7715) , BIOS V3.1 
04/16/2012
[ 1621.875908] Call Trace:
[ 1621.875914] dump_stack+0x4d/0x65
[ 1621.875916] bad_page+0xc1/0x130
[ 1621.875917] check_new_page_bad+0x75/0x80
[ 1621.875918] get_page_from_freelist+0x6fd/0xab0
[ 1621.875920] ? enqueue_task_fair+0xfa5/0x1910
[ 1621.875921] ? select_task_rq_fair+0xaeb/0xed0
[ 1621.875923] __alloc_pages_nodemask+0xcb/0x200
[ 1621.875924] alloc_pages_current+0x8d/0x110
[ 1621.875926] alloc_skb_with_frags+0xcb/0x1c0
[ 1621.875928] sock_alloc_send_pskb+0x1e4/0x210
[ 1621.875931] unix_stream_sendmsg+0x26c/0x3b0 [unix]
[ 1621.875932] sock_sendmsg+0x33/0x40
[ 1621.875933] sock_write_iter+0x76/0xd0
[ 1621.875935] __do_readv_writev+0x29e/0x370
[ 1621.875936] do_readv_writev+0x78/0xa0
[ 1621.875937] vfs_writev+0x37/0x50
[ 1621.875939] ? __fdget_pos+0x12/0x50
[ 1621.875940] ? vfs_writev+0x37/0x50
[ 1621.875940] do_writev+0x4d/0xd0
[ 1621.875942] SyS_writev+0xb/0x10
[ 1621.875943] entry_SYSCALL_64_fastpath+0x13/0x94
[ 1621.875944] RIP: 0033:0x7f730f8dd2f0
[ 1621.875945] RSP: 002b:7f72f15f55f0 EFLAGS: 0293 ORIG_RAX: 
0014
[ 1621.875946] RAX: ffda RBX: 7f72cc39cf28 RCX: 7f730f8dd2f0
[ 1621.875947] RDX: 0003 RSI: 7f72f15f5790 RDI: 0004
[ 1621.875947] RBP: 7f72c7ee1c00 R08:  R09: 
[ 1621.875948] R10: 0020 R11: 0293 R12: 7f72f15f6e01
[ 1621.875948] R13:  R14: 0166 R15: 7f72cc39d428
[ 1621.875949] Disabling lock debugging due to kernel taint


Re: [PATCH v2] sh_eth: add support to change MTU

2017-06-09 Thread Niklas Söderlund
On 2017-06-09 22:32:15 +0200, Niklas Söderlund wrote:
> The hardware supports the MTU to be changed and the driver it self is
> somewhat prepared to support this. This patch hooks up the callbacks to
> be able to change the MTU from user-space.
> 
> Signed-off-by: Niklas Söderlund 
> Acked-by: Sergei Shtylyov 
> ---
> 
> Based on v4.12-rc1 and tested on Renesas R-Car Koelsch M2.
> 
> Test procedure:
> 
> 1. On host set MTU to something large (9000) was used for this test.
> 
> 2. On target set MTU to something other then 1500, in this test the max
>MTU of 1978 is used.
> 
> 3. Send ping with large payload and observe that it works.
> 
>ping -M do -s 1954 
> 
>The reason for 1954 instead of 1982 is two fold:
> 
>1. On Linux (different on Mac IIRC) the ICMP/ping implementation
>   doesn’t encapsulate the 28 byte ICMP (8) + IP (20).
>2. The driver internally reserve 4 bytes of transmission buffer for
>   an optional VLAN header (4). And since no VLAN is used in this
>   setup the additional 4 bytes can carry data.
> 
> 4. For extra verification the packet flow is inspected using tcpdump to
>verify that there is no packet fragmentation.
> 
> * Changes since v1
> - Fix spelling mistake in comment, thanks Sergei!
> - Add Acked-by from Sergei.
> 
>  drivers/net/ethernet/renesas/sh_eth.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
> b/drivers/net/ethernet/renesas/sh_eth.c
> index f68c4db656eda846..9c6e4025bfc9f5c5 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -2558,6 +2558,17 @@ static int sh_eth_do_ioctl(struct net_device *ndev, 
> struct ifreq *rq, int cmd)
>   return phy_mii_ioctl(phydev, rq, cmd);
>  }
>  
> +static int sh_eth_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + if (netif_running(dev))
> + return -EBUSY;
> +
> + dev->mtu = new_mtu;
> + netdev_update_features(dev);
> +
> + return 0;
> +}
> +
>  /* For TSU_POSTn. Please refer to the manual about this (strange) bitfields 
> */
>  static void *sh_eth_tsu_get_post_reg_offset(struct sh_eth_private *mdp,
>   int entry)
> @@ -3029,6 +3040,7 @@ static const struct net_device_ops sh_eth_netdev_ops = {
>   .ndo_set_rx_mode= sh_eth_set_rx_mode,
>   .ndo_tx_timeout = sh_eth_tx_timeout,
>   .ndo_do_ioctl   = sh_eth_do_ioctl,
> + .ndo_change_mtu = sh_eth_change_mtu,
>   .ndo_validate_addr  = eth_validate_addr,
>   .ndo_set_mac_address= eth_mac_addr,
>  };
> @@ -3043,6 +3055,7 @@ static const struct net_device_ops 
> sh_eth_netdev_ops_tsu = {
>   .ndo_vlan_rx_kill_vid   = sh_eth_vlan_rx_kill_vid,
>   .ndo_tx_timeout = sh_eth_tx_timeout,
>   .ndo_do_ioctl   = sh_eth_do_ioctl,
> + .ndo_change_mtu = sh_eth_change_mtu,
>   .ndo_validate_addr  = eth_validate_addr,
>   .ndo_set_mac_address= eth_mac_addr,
>  };
> @@ -3171,6 +3184,13 @@ static int sh_eth_drv_probe(struct platform_device 
> *pdev)
>   }
>   sh_eth_set_default_cpu_data(mdp->cd);
>  
> + /* User's manua states max MTU should be 2048 but due to the

s/manua/manual/

Do I need to resend or can this be fixed up when applying? Sorry for 
this.

> +  * alignment calculations in sh_eth_ring_init() the practical
> +  * MTU is a bit less. Maybe this can be optimized some more.
> +  */
> + ndev->max_mtu = 2000 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
> + ndev->min_mtu = ETH_MIN_MTU;
> +
>   /* set function */
>   if (mdp->cd->tsu)
>   ndev->netdev_ops = _eth_netdev_ops_tsu;
> -- 
> 2.13.1
> 

-- 
Regards,
Niklas Söderlund


[PATCH v2] sh_eth: add support to change MTU

2017-06-09 Thread Niklas Söderlund
The hardware supports the MTU to be changed and the driver it self is
somewhat prepared to support this. This patch hooks up the callbacks to
be able to change the MTU from user-space.

Signed-off-by: Niklas Söderlund 
Acked-by: Sergei Shtylyov 
---

Based on v4.12-rc1 and tested on Renesas R-Car Koelsch M2.

Test procedure:

1. On host set MTU to something large (9000) was used for this test.

2. On target set MTU to something other then 1500, in this test the max
   MTU of 1978 is used.

3. Send ping with large payload and observe that it works.

   ping -M do -s 1954 

   The reason for 1954 instead of 1982 is two fold:

   1. On Linux (different on Mac IIRC) the ICMP/ping implementation
  doesn’t encapsulate the 28 byte ICMP (8) + IP (20).
   2. The driver internally reserve 4 bytes of transmission buffer for
  an optional VLAN header (4). And since no VLAN is used in this
  setup the additional 4 bytes can carry data.

4. For extra verification the packet flow is inspected using tcpdump to
   verify that there is no packet fragmentation.

* Changes since v1
- Fix spelling mistake in comment, thanks Sergei!
- Add Acked-by from Sergei.

 drivers/net/ethernet/renesas/sh_eth.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index f68c4db656eda846..9c6e4025bfc9f5c5 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2558,6 +2558,17 @@ static int sh_eth_do_ioctl(struct net_device *ndev, 
struct ifreq *rq, int cmd)
return phy_mii_ioctl(phydev, rq, cmd);
 }
 
+static int sh_eth_change_mtu(struct net_device *dev, int new_mtu)
+{
+   if (netif_running(dev))
+   return -EBUSY;
+
+   dev->mtu = new_mtu;
+   netdev_update_features(dev);
+
+   return 0;
+}
+
 /* For TSU_POSTn. Please refer to the manual about this (strange) bitfields */
 static void *sh_eth_tsu_get_post_reg_offset(struct sh_eth_private *mdp,
int entry)
@@ -3029,6 +3040,7 @@ static const struct net_device_ops sh_eth_netdev_ops = {
.ndo_set_rx_mode= sh_eth_set_rx_mode,
.ndo_tx_timeout = sh_eth_tx_timeout,
.ndo_do_ioctl   = sh_eth_do_ioctl,
+   .ndo_change_mtu = sh_eth_change_mtu,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_mac_address= eth_mac_addr,
 };
@@ -3043,6 +3055,7 @@ static const struct net_device_ops sh_eth_netdev_ops_tsu 
= {
.ndo_vlan_rx_kill_vid   = sh_eth_vlan_rx_kill_vid,
.ndo_tx_timeout = sh_eth_tx_timeout,
.ndo_do_ioctl   = sh_eth_do_ioctl,
+   .ndo_change_mtu = sh_eth_change_mtu,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_mac_address= eth_mac_addr,
 };
@@ -3171,6 +3184,13 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
}
sh_eth_set_default_cpu_data(mdp->cd);
 
+   /* User's manua states max MTU should be 2048 but due to the
+* alignment calculations in sh_eth_ring_init() the practical
+* MTU is a bit less. Maybe this can be optimized some more.
+*/
+   ndev->max_mtu = 2000 - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN);
+   ndev->min_mtu = ETH_MIN_MTU;
+
/* set function */
if (mdp->cd->tsu)
ndev->netdev_ops = _eth_netdev_ops_tsu;
-- 
2.13.1



Re: [PATCH] sh_eth: add support to change MTU

2017-06-09 Thread Niklas Söderlund
Hi Sergei,

Thanks for your feedback.

On 2017-06-09 19:31:09 +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 06/09/2017 06:30 PM, Niklas Söderlund wrote:
> 
> > The hardware supports the MTU to be changed and the driver it self is
> > somewhat prepared to support this. This patch hooks up the callbacks to
> > be able to change the MTU from user-space.
> > 
> > Signed-off-by: Niklas Söderlund 
> [...]
> 
>One more thing off my back, thanks! :-)
>I'm OK with this patch in principle (but have several nits):

Will update nits and send v2 containing your Acked-by.

> 
> Acked-by: Sergei Shtylyov 

Thanks :-)

> 
> > diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
> > b/drivers/net/ethernet/renesas/sh_eth.c
> > index f68c4db656eda846..da41eda7bfada6b9 100644
> > --- a/drivers/net/ethernet/renesas/sh_eth.c
> > +++ b/drivers/net/ethernet/renesas/sh_eth.c
> [...]
> > @@ -3171,6 +3184,13 @@ static int sh_eth_drv_probe(struct platform_device 
> > *pdev)
> > }
> > sh_eth_set_default_cpu_data(mdp->cd);
> > 
> > +   /* Datasheet states max MTU should be 2048 but due to the
> 
>User's manual. :-)

Will update in v2.

>Somehow I thought it supports jumbo frames but the manual doesn't confirm
> that... ah, that's EtherAVB! :-)

Yes I did look for that but could not find that for sh_eth :-)

> 
> > +* aliment calculations in sh_eth_ring_init() the practical
> 
>   Alignment.

Will fix in v2.

> 
> > +* MTU is a bit less. Maybe this can be optimized some more.
> 
>Undoubtedly... :-)

:-)

> 
> [...]
> 
> MBR, Sergei
> 

-- 
Regards,
Niklas Söderlund


RE: [Intel-wired-lan] [i40e] regression on TCP stream and TCP maerts, kernel-4.12.0-0.rc2

2017-06-09 Thread Keller, Jacob E


> -Original Message-
> From: Alexander Duyck [mailto:alexander.du...@gmail.com]
> Sent: Friday, June 09, 2017 12:59 PM
> To: Adrian Tomasov ; Kirsher, Jeffrey T
> ; Keller, Jacob E 
> Cc: Duyck, Alexander H ; osab...@redhat.com;
> netdev@vger.kernel.org; aokul...@redhat.com; intel-wired-...@lists.osuosl.org;
> jhla...@redhat.com
> Subject: Re: [Intel-wired-lan] [i40e] regression on TCP stream and TCP maerts,
> kernel-4.12.0-0.rc2
> 
> On Fri, Jun 9, 2017 at 3:34 AM, Adrian Tomasov  wrote:
> > On Thu, 2017-06-01 at 19:18 +, Duyck, Alexander H wrote:
> >> On Thu, 2017-06-01 at 12:14 +0200, Adrian Tomasov wrote:
> >> >
> >> > On Wed, 2017-05-31 at 14:42 -0700, Alexander Duyck wrote:
> >> > >
> >> > >
> >> > > On Wed, May 31, 2017 at 6:48 AM, Adrian Tomasov  >> > > com>
> >> > > wrote:
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, 2017-05-30 at 18:27 -0700, Alexander Duyck wrote:
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Tue, May 30, 2017 at 8:41 AM, Alexander Duyck
> >> > > > >  wrote:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, May 30, 2017 at 6:43 AM, Adam Okuliar  >> > > > > > hat.
> >> > > > > > com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Hello,
> >> > > > > > >
> >> > > > > > > we found regression on intel card(XL710) with i40e
> >> > > > > > > driver.
> >> > > > > > > Regression is
> >> > > > > > > about ~45%
> >> > > > > > > on TCP_STREAM and TCP_MAERTS test for IPv4 and IPv6.
> >> > > > > > > Regression
> >> > > > > > > was first
> >> > > > > > > visible in kernel-4.12.0-0.rc1.
> >> > > > > > >
> >> > > > > > > More details about results you can see in uploaded images
> >> > > > > > > in
> >> > > > > > > bugzilla. [0]
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=195923
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Best regards, / S pozdravom,
> >> > > > > > >
> >> > > > > > > Adrián Tomašov
> >> > > > > > > Kernel Performance QE
> >> > > > > > > atoma...@redhat.com
> >> > > > > >
> >> > > > > > I have added the i40e driver maintainer and the intel-
> >> > > > > > wired-lan
> >> > > > > > mailing list so that we can make are developers aware of
> >> > > > > > the
> >> > > > > > issue.
> >> > > > > >
> >> > > > > > Thanks.
> >> > > > > >
> >> > > > > > - Alex
> >> > > > >
> >> > > > > Adam,
> >> > > > >
> >> > > > > We are having some issues trying to reproduce what you
> >> > > > > reported.
> >> > > > >
> >> > > > > Can you provide some additional data. Specifically we would
> >> > > > > be
> >> > > > > looking
> >> > > > > for an "ethtool -i", and an "ethtool -S" for the port before
> >> > > > > and
> >> > > > > after
> >> > > > > the test. If you can attach it to the bugzilla that would be
> >> > > > > appreciated.
> >> > > > >
> >> > > > > Thanks.
> >> > > > >
> >> > > > > - Alex
> >> > > >
> >> > > > Hello Alex,
> >> > > >
> >> > > > requested files are updated in bugzilla.
> >> > > >
> >> > > > If you have any questions about testing feel free to ask.
> >> > > >
> >> > > >
> >> > > > Best regards,
> >> > > >
> >> > > > Adrian
> >> > >
> >> > > So looking at the data I wonder if we don't have an MTU mismatch
> >> > > in
> >> > > the network config. I notice the "after" has rx_length_errors
> >> > > being
> >> > > reported. Recent changes made it so that i40e doesn't support
> >> > > jumbo
> >> > > frames by default, whereas before we could. You might want to
> >> > > check
> >> > > for that as that could cause the kind of performance issues you
> >> > > are
> >> > > seeing.
> >> > >
> >> > > - Alex
> >> >
> >> > There isn't MTU mismatch. Traffic path is : server -> switch ->
> >> > server.
> >> >
> >> >
> >> > Output from switch:
> >> >
> >> > > show interfaces et-0/0/18
> >> > Physical interface: et-0/0/18, Enabled, Physical link is Up
> >> >   Interface index: 644, SNMP ifIndex: 538
> >> >   Link-level type: Ethernet, MTU: 1514, Speed: 40Gbps, BPDU
> >> > Error:
> >> > None, MAC-REWRITE Error: None, Loopback: Disabled, Source
> >> > filtering:
> >> > Disabled, Flow control: Disabled, Media type: Fiber
> >> >   Device flags   : Present Running
> >> >   Interface flags: SNMP-Traps Internal: 0x4000
> >> >   Link flags : None
> >> >   CoS queues : 12 supported, 12 maximum usable queues
> >> >   Current address: d4:04:ff:90:5a:4b, Hardware address:
> >> > d4:04:ff:90:5a:4b
> >> >   Last flapped   : 2017-06-01 10:09:32 CEST (01:21:29 ago)
> >> >   Input rate : 432 bps (0 pps)
> >> >   Output rate: 8336 bps (11 pps)
> >> >   Active alarms  : None
> >> >   Active defects : None
> >> >   Interface transmit statistics: Disabled
> >> >
> >> 

Re: [PATCH v2 7/8] net: mvmdio: add xmdio support

2017-06-09 Thread Russell King - ARM Linux
On Fri, Jun 09, 2017 at 08:40:19AM +0200, Antoine Tenart wrote:
> Hi Andrew,
> 
> On Thu, Jun 08, 2017 at 06:03:31PM +0200, Andrew Lunn wrote:
> > On Thu, Jun 08, 2017 at 11:26:52AM +0200, Antoine Tenart wrote:
> > > +#define MVMDIO_XSMI_MGNT_REG 0x0
> > > +#define  MVMDIO_XSMI_READ_VALID  BIT(29)
> > > +#define  MVMDIO_XSMI_BUSYBIT(30)
> > > +#define MVMDIO_XSMI_ADDR_REG 0x8
> > > +#define  MVMDIO_XSMI_PHYADDR_SHIFT   16
> > > +#define  MVMDIO_XSMI_DEVADDR_SHIFT   21
> > > +#define  MVMDIO_XSMI_READ_OPERATION  (0x7 << 26)
> > > +#define  MVMDIO_XSMI_WRITE_OPERATION (0x5 << 27)
> > 
> > These two operations seem odd. Generally ops have the same shift.
> 
> Indeed, this is odd. I'll have a look at this.

The Marvell driver uses 5 << 26:

+#define XOPCODE_OFFS   26
+#define XOPCODE_ADDR_READ  (7 << XOPCODE_OFFS)
+#define XOPCODE_ADDR_WRITE (5 << XOPCODE_OFFS)

What this means is that with the incorrect shift in your driver,
although writes appeared to work, they actually resulted in a
post-read-increment-address frame (and hence no error.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [Intel-wired-lan] [i40e] regression on TCP stream and TCP maerts, kernel-4.12.0-0.rc2

2017-06-09 Thread Alexander Duyck
On Fri, Jun 9, 2017 at 3:34 AM, Adrian Tomasov  wrote:
> On Thu, 2017-06-01 at 19:18 +, Duyck, Alexander H wrote:
>> On Thu, 2017-06-01 at 12:14 +0200, Adrian Tomasov wrote:
>> >
>> > On Wed, 2017-05-31 at 14:42 -0700, Alexander Duyck wrote:
>> > >
>> > >
>> > > On Wed, May 31, 2017 at 6:48 AM, Adrian Tomasov > > > com>
>> > > wrote:
>> > > >
>> > > >
>> > > >
>> > > > On Tue, 2017-05-30 at 18:27 -0700, Alexander Duyck wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, May 30, 2017 at 8:41 AM, Alexander Duyck
>> > > > >  wrote:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, May 30, 2017 at 6:43 AM, Adam Okuliar > > > > > > hat.
>> > > > > > com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > we found regression on intel card(XL710) with i40e
>> > > > > > > driver.
>> > > > > > > Regression is
>> > > > > > > about ~45%
>> > > > > > > on TCP_STREAM and TCP_MAERTS test for IPv4 and IPv6.
>> > > > > > > Regression
>> > > > > > > was first
>> > > > > > > visible in kernel-4.12.0-0.rc1.
>> > > > > > >
>> > > > > > > More details about results you can see in uploaded images
>> > > > > > > in
>> > > > > > > bugzilla. [0]
>> > > > > > >
>> > > > > > >
>> > > > > > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=195923
>> > > > > > >
>> > > > > > >
>> > > > > > > Best regards, / S pozdravom,
>> > > > > > >
>> > > > > > > Adrián Tomašov
>> > > > > > > Kernel Performance QE
>> > > > > > > atoma...@redhat.com
>> > > > > >
>> > > > > > I have added the i40e driver maintainer and the intel-
>> > > > > > wired-lan
>> > > > > > mailing list so that we can make are developers aware of
>> > > > > > the
>> > > > > > issue.
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > - Alex
>> > > > >
>> > > > > Adam,
>> > > > >
>> > > > > We are having some issues trying to reproduce what you
>> > > > > reported.
>> > > > >
>> > > > > Can you provide some additional data. Specifically we would
>> > > > > be
>> > > > > looking
>> > > > > for an "ethtool -i", and an "ethtool -S" for the port before
>> > > > > and
>> > > > > after
>> > > > > the test. If you can attach it to the bugzilla that would be
>> > > > > appreciated.
>> > > > >
>> > > > > Thanks.
>> > > > >
>> > > > > - Alex
>> > > >
>> > > > Hello Alex,
>> > > >
>> > > > requested files are updated in bugzilla.
>> > > >
>> > > > If you have any questions about testing feel free to ask.
>> > > >
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Adrian
>> > >
>> > > So looking at the data I wonder if we don't have an MTU mismatch
>> > > in
>> > > the network config. I notice the "after" has rx_length_errors
>> > > being
>> > > reported. Recent changes made it so that i40e doesn't support
>> > > jumbo
>> > > frames by default, whereas before we could. You might want to
>> > > check
>> > > for that as that could cause the kind of performance issues you
>> > > are
>> > > seeing.
>> > >
>> > > - Alex
>> >
>> > There isn't MTU mismatch. Traffic path is : server -> switch ->
>> > server.
>> >
>> >
>> > Output from switch:
>> >
>> > > show interfaces et-0/0/18
>> > Physical interface: et-0/0/18, Enabled, Physical link is Up
>> >   Interface index: 644, SNMP ifIndex: 538
>> >   Link-level type: Ethernet, MTU: 1514, Speed: 40Gbps, BPDU
>> > Error:
>> > None, MAC-REWRITE Error: None, Loopback: Disabled, Source
>> > filtering:
>> > Disabled, Flow control: Disabled, Media type: Fiber
>> >   Device flags   : Present Running
>> >   Interface flags: SNMP-Traps Internal: 0x4000
>> >   Link flags : None
>> >   CoS queues : 12 supported, 12 maximum usable queues
>> >   Current address: d4:04:ff:90:5a:4b, Hardware address:
>> > d4:04:ff:90:5a:4b
>> >   Last flapped   : 2017-06-01 10:09:32 CEST (01:21:29 ago)
>> >   Input rate : 432 bps (0 pps)
>> >   Output rate: 8336 bps (11 pps)
>> >   Active alarms  : None
>> >   Active defects : None
>> >   Interface transmit statistics: Disabled
>> >
>> >   Logical interface et-0/0/18.0 (Index 552) (SNMP ifIndex 539)
>> > Flags: SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
>> > Input packets : 464041
>> > Output packets: 209210
>> > Protocol eth-switch, MTU: 1514
>> >   Flags: Is-Primary, Trunk-Mode
>> >
>> >
>> > MTU is same for all et-0/0/x interfaces.
>> >
>> > - Adrian
>>
>> One thing you might try try doing is toggling the legacy-rx flag
>> using
>> the "ethtool --show-priv-flags/--set-priv-flags" command to see if
>> that
>> has any impact. That will help to rule things out as the most
>> significant change I can think of is the recent update of the Rx path
>> to support XDP.
>>
>> Also one other thing you might try would be to use a fixed interrupt
>> moderation rate by locking things down using "ethtool 

[PATCH net-next] Revert "ipv6: Initial skb->dev and skb->protocol in ip6_output"

2017-06-09 Thread Chenbo Feng
From: Chenbo Feng 

This reverts commit 97a7a37a7b7b("ipv6: Initial skb->dev and
skb->protocol in ip6_output") since it does not handles the
skb->dev assignment inside ip6_fragment() code path properly.
Need to rework and upload again

Fixes: 97a7a37a7b7b("ipv6: Initial skb->dev and skb->protocol in ip6_output")
Signed-off-by: Chenbo Feng 
---
 net/ipv6/ip6_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 02cd44f..bf8a58a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -67,6 +67,9 @@ static int ip6_finish_output2(struct net *net, struct sock 
*sk, struct sk_buff *
struct in6_addr *nexthop;
int ret;
 
+   skb->protocol = htons(ETH_P_IPV6);
+   skb->dev = dev;
+
if (ipv6_addr_is_multicast(_hdr(skb)->daddr)) {
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
@@ -151,9 +154,6 @@ int ip6_output(struct net *net, struct sock *sk, struct 
sk_buff *skb)
struct net_device *dev = skb_dst(skb)->dev;
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
-   skb->protocol = htons(ETH_P_IPV6);
-   skb->dev = dev;
-
if (unlikely(idev->cnf.disable_ipv6)) {
IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
kfree_skb(skb);
-- 
2.7.4



Re: [PATCH v2 7/8] net: mvmdio: add xmdio support

2017-06-09 Thread Russell King - ARM Linux
On Fri, Jun 09, 2017 at 06:22:16PM +0200, Antoine Tenart wrote:
> On Fri, Jun 09, 2017 at 05:03:40PM +0200, Andrew Lunn wrote:
> > > There are two busses, one generating c22 transactions and one generating
> > > c45 transactions. Each bus has its own MDC/MDIO pins.
> > 
> > O.K. That is what i wanted to know. So we want two completely separate
> > device tree bindings, busses registered with Linux, etc.
> > 
> > Thanks for clarification.
> 
> So in the end I need one change in v3: to bind the xSMI usage to
> marvell,xmdio and the SMI one to marvell,orion-mdio. (Plus the GENMASK
> and offset comments).

Also, you need to
1. trap out on incorrect MII_ADDR_C45 in regnum for the interface.
2. mask the dev_addr with GENMASK(4, 0) (as merely shifting will leave
   the MII_ADDR_C45 bit set.)
3. moving MVMDIO_XSMI_PHYADDR_SHIFT / MVMDIO_XSMI_DEVADDR_SHIFT /
   MVMDIO_XSMI_READ_OPERATION / MVMDIO_XSMI_WRITE_OPERATION under the
   MVMDIO_XSMI_MGNT_REG reg - these definitions are nothing to do with
   MVMDIO_XSMI_ADDR_REG.
4. fixing MVMDIO_XSMI_WRITE_OPERATION to be 5 << 26, not 5 << 27.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH net-next] cxgb4: fix memory leak in init_one()

2017-06-09 Thread David Miller
From: Ganesh Goudar 
Date: Fri,  9 Jun 2017 19:26:24 +0530

> Free up mbox_log allocated for PF0 to PF3.
> 
> Fixes: 7829451c695e ("cxgb4: Add control net_device for configuring PCIe VF")
> Signed-off-by: Ganesh Goudar 

Applied, thank you.


Re: [PATCH net-next] qed: add qed_int_sb_init() stub function

2017-06-09 Thread David Miller
From: Arnd Bergmann 
Date: Fri,  9 Jun 2017 12:37:35 +0200

> When CONFIG_QED_SRIOV is disabled, we get a build error:
> 
> drivers/net/ethernet/qlogic/qed/qed_int.c: In function 'qed_int_sb_init':
> drivers/net/ethernet/qlogic/qed/qed_int.c:1499:4: error: implicit declaration 
> of function 'qed_vf_set_sb_info'; did you mean 'qed_mcp_get_resc_info'? 
> [-Werror=implicit-function-declaration]
> 
> All the other declarations have a 'static inline' stub as an alternative
> here, so this adds one more for qed_int_sb_init.
> 
> Fixes: 50a207147fce ("qed: Hold a single array for SBs")
> Signed-off-by: Arnd Bergmann 

Applied, thank you.


Re: [PATCH v2 7/8] net: mvmdio: add xmdio support

2017-06-09 Thread Russell King - ARM Linux
On Fri, Jun 09, 2017 at 04:49:36PM +0200, Andrew Lunn wrote:
> On Fri, Jun 09, 2017 at 04:09:22PM +0200, Antoine Tenart wrote:
> > The MDIO/xMDIO registers are embedded into the network controller. The
> > mvmdio driver was created at first to abstract this functionality
> > outside the network controller driver because it is shared between all
> > ports and used in different IPs. So it's not really devices per say.
> > 
> > Looking at the datasheet/schematics there are two hardware buses, one
> > for c22 and one for c45. So we should keep two separate nodes to
> > describe the two interfaces. From what I read c45 is backward
> > compatible with c22 so the xSMI interface should be capable to speak to
> > c22 PHYs as well.
> 
> The on the wire protocol of c45 is backwards compatible with c22, in
> that a c22 device will not get confused by a c45 transaction on the
> bus. A c22 device will just ignore it. You cannot talk to a c22 device
> using c45.

>From what I can tell, having 'scoped the MDIO line and tried writing
several different values to the XSMI registers, it is not possible
for the XSMI block to generate C22 frame structures - the "start"
bits are always "00", and I can't make them the required "01" for C22.

However, I can confirm that bits 26 and 27 of the XSMI register are used
directly for the OP field (so 0 << 26 produces a C45 address frame.)  I
suspect, although I haven't delved that deeply (yet) that bit 28 sets
whether XSMI produces an address cycle itself along with the data cycle.

Bit 31 appears to be writable, but has no effect on the frame structure.

> What i'm worried about is there being one set of MDC/MDIO lines. You
> should not expose that to linux as two mdio busses. It is one bus.

We're independent - the SMI and XSMI blocks are two entirely separate
interfaces with entirely separate hardware MDC/MDIO lines.  The XSMI
MDC/MDIO lines remain at logic '1' while phylib polls the PHY via
the SMI interface.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH v2 net-next 0/8] qed*: Light L2 updates

2017-06-09 Thread David Miller
From: Yuval Mintz 
Date: Fri, 9 Jun 2017 17:13:17 +0300

> This series does a major overhaul of the LL2 logic in qed.
> The single biggest change done here is in #5 where we're changing
> the API qed provides for LL2 [both internally in case of storage and
> externally in case of RoCE] to become callback-based to allow cleaner
> scalability in preperation to the future iWARP submission which would
> aadd additional flavors of LL2. It's also the only patch in series
> to modify !qed logic [qedr].
> 
> Patches prior to that mostly deal with refactoring LL2 code,
> encapsulating varaious parameters into structure and re-ordering
> of LL2 code. The latter patches add some small missing bits of LL2
> ffunctionality.

Series applied, thank you.


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Chenbo Feng



On 06/09/2017 12:24 PM, Bjørn Mork wrote:

Chenbo Feng  writes:


This patch is still under working since it may have problem with
ip_fragment() call, did you applied it already? Should I send a revert
patch to you then?

It does? I initially thought so too, but looking closer I believe the
ip6_copy_metadata() calls in ip6_fragment() takes care of it.



Bjørn

At least in the fail_toobig code path of ip_fragment() call, skb->dev 
get assigned again. It seems to be redundant with this patch or it will 
rewrite the skb->dev field. I will revert this one and upload again 
after I have a proper handle for that.


Re: pull-request: can 2017-06-09

2017-06-09 Thread David Miller
From: Marc Kleine-Budde 
Date: Fri,  9 Jun 2017 14:55:07 +0200

> this is a pull request of 6 patches for net/master.
> 
> There's a patch by Stephane Grosjean that fixes an uninitialized symbol 
> warning
> in the peak_canfd driver. A patch by Johan Hovold to fix the product-id
> endianness in an error message in the the peak_usb driver. A patch by Oliver
> Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by
> me, one makes the helper function can_change_state() robust to be called with
> cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the
> last one fixes a lockdep splat by properly initialize the per-net
> can_rcvlists_lock spin_lock.

Pulled, thank you Marc.


Re: [PATCH net] mac80211: free netdev on dev_alloc_name() error

2017-06-09 Thread David Miller
From: Johannes Berg 
Date: Fri,  9 Jun 2017 21:33:09 +0200

> From: Johannes Berg 
> 
> The change to remove free_netdev() from ieee80211_if_free()
> erroneously didn't add the necessary free_netdev() for when
> ieee80211_if_free() is called directly in one place, rather
> than as the priv_destructor. Add the missing call.
> 
> Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private 
> netdev state.")
> Signed-off-by: Johannes Berg 

Applied, thanks Johannes.


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread David Miller
From: Chenbo Feng 
Date: Fri, 9 Jun 2017 12:13:57 -0700

> 
> 
> On 06/09/2017 12:08 PM, David Miller wrote:
>> From: Chenbo Feng 
>> Date: Fri,  9 Jun 2017 12:06:07 -0700
>>
>>> From: Chenbo Feng 
>>>
>>> Move the initialization of skb->dev and skb->protocol from
>>> ip6_finish_output2 to ip6_output. This can make the skb->dev and
>>> skb->protocol information avalaible to the CGROUP eBPF filter.
>>>
>>> Signed-off-by: Chenbo Feng 
>>> Acked-by: Eric Dumazet 
>> Applied, thanks.
>>
>> This makes ipv6 consistent with ipv4.
>>
>> I am surprised this wasn't noticed, for example, in netfilter.
>> .
>>
> Hi David,
> 
> This patch is still under working since it may have problem with
> ip_fragment() call, did you applied it already? Should I send a revert
> patch to you then?

A revert is necessary or a relative fixup.

Thank you.


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread David Miller
From: Chenbo Feng 
Date: Fri, 9 Jun 2017 12:08:39 -0700

> Sorry, this is the wrong patch, please ignore it.

:-/ already applied it.

You must now send a relative fixup patch.


Re: [PATCH net-next 00/11] r8152: minor adjustment

2017-06-09 Thread David Miller
From: Hayes Wang 
Date: Fri, 9 Jun 2017 17:11:37 +0800

> Adjust some code to make it reasonable or satisfy the suggestion from
> the engineers.

Series applied, thank you.


Re: [PATCH net RESEND] net: rps: send out pending IPI's on CPU hotplug

2017-06-09 Thread David Miller
From: Ashwanth Goli 
Date: Fri,  9 Jun 2017 14:24:58 +0530

> IPI's from the victim cpu are not handled in dev_cpu_callback.
> So these pending IPI's would be sent to the remote cpu only when
> NET_RX is scheduled on the victim cpu and since this trigger is
> unpredictable it would result in packet latencies on the remote cpu.
> 
> This patch add support to send the pending ipi's of victim cpu.
> 
> Signed-off-by: Ashwanth Goli 

Applied, thank you.


[PATCH net] mac80211: free netdev on dev_alloc_name() error

2017-06-09 Thread Johannes Berg
From: Johannes Berg 

The change to remove free_netdev() from ieee80211_if_free()
erroneously didn't add the necessary free_netdev() for when
ieee80211_if_free() is called directly in one place, rather
than as the priv_destructor. Add the missing call.

Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private 
netdev state.")
Signed-off-by: Johannes Berg 
---
 net/mac80211/iface.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 915d7e1b4545..f5f50150ba1c 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1816,6 +1816,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const 
char *name,
ret = dev_alloc_name(ndev, ndev->name);
if (ret < 0) {
ieee80211_if_free(ndev);
+   free_netdev(ndev);
return ret;
}
 
-- 
2.11.0



Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-09 Thread David Miller
From: 
Date: Fri, 9 Jun 2017 09:55:16 +0300

> This patchset contains fixes for the bugs that were discovered so far.

You submitted patch #6 twice, once with the word "stuck" in the subject
line, once with the word "hang" in the subject line.

Please sort this out and resubmit, thanks.



Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Bjørn Mork
Chenbo Feng  writes:

> This patch is still under working since it may have problem with
> ip_fragment() call, did you applied it already? Should I send a revert
> patch to you then?

It does? I initially thought so too, but looking closer I believe the
ip6_copy_metadata() calls in ip6_fragment() takes care of it.



Bjørn


[PATCH net-next] bpf: Remove duplicate tcp_filter hook in ipv6

2017-06-09 Thread Chenbo Feng
From: Chenbo Feng 

There are two tcp_filter hooks in tcp_ipv6 ingress path currently.
One is at tcp_v6_rcv and another is in tcp_v6_do_rcv. It seems the
tcp_filter() call inside tcp_v6_do_rcv is redundent and some packet
will be filtered twice in this situation. This will cause trouble
when using eBPF filters to account traffic data.

Signed-off-by: Chenbo Feng 
Acked-by: Eric Dumazet 
---
 net/ipv6/tcp_ipv6.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0840543..84ad502 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1249,9 +1249,6 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
if (skb->protocol == htons(ETH_P_IP))
return tcp_v4_do_rcv(sk, skb);
 
-   if (tcp_filter(sk, skb))
-   goto discard;
-
/*
 *  socket locking is here for SMP purposes as backlog rcv
 *  is currently called with bh processing disabled.
-- 
2.7.4



Re: [PATCH net-next 2/2] bpf: Fix test_obj_id.c for llvm 5.0

2017-06-09 Thread David Miller
From: Martin KaFai Lau 
Date: Thu, 8 Jun 2017 22:30:17 -0700

> llvm 5.0 does not like the section name and the function name
> to be the same:
> 
> clang -I. -I./include/uapi -I../../../include/uapi \
>   -I../../../../samples/bpf/ \
>   -Wno-compare-distinct-pointer-types \
>   -O2 -target bpf -c \
>   linux/tools/testing/selftests/bpf/test_obj_id.c -o \
>   linux/tools/testing/selftests/bpf/test_obj_id.o
> fatal error: error in backend: 'test_prog_id' label emitted multiple times to
> assembly file
> clang-5.0: error: clang frontend command failed with exit code 70 (use -v to
> see invocation)
> clang version 5.0.0 (trunk 304326) (llvm/trunk 304329)
> 
> This patch makes changes to the section name and the function name.
> 
> Fixes: 95b9afd3987f ("bpf: Test for bpf ID")
> Reported-by: Alexei Starovoitov 
> Reported-by: Yonghong Song 
> Signed-off-by: Martin KaFai Lau 

Applied.


Re: [PATCH net-next 1/2] bpf: Fix test_bpf_obj_id() when the bpf_jit_enable sysctl is diabled

2017-06-09 Thread David Miller
From: Martin KaFai Lau 
Date: Thu, 8 Jun 2017 22:30:16 -0700

> test_bpf_obj_id() should not expect a non zero jited_prog_len
> to be returned by bpf_obj_get_info_by_fd() when
> net.core.bpf_jit_enable is 0.
> 
> The patch checks for net.core.bpf_jit_enable and
> has different expectation on jited_prog_len.
> 
> This patch also removes the pwd.h header which I forgot
> to remove after making changes.
> 
> Fixes: 95b9afd3987f ("bpf: Test for bpf ID")
> Reported-by: Yonghong Song 
> Signed-off-by: Martin KaFai Lau 

Applied, but please in the future provide a proper header posting
with Subject "[PATCH net-next 0/N] ...".

Thanks.


Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Chenbo Feng



On 06/09/2017 12:08 PM, David Miller wrote:

From: Chenbo Feng 
Date: Fri,  9 Jun 2017 12:06:07 -0700


From: Chenbo Feng 

Move the initialization of skb->dev and skb->protocol from
ip6_finish_output2 to ip6_output. This can make the skb->dev and
skb->protocol information avalaible to the CGROUP eBPF filter.

Signed-off-by: Chenbo Feng 
Acked-by: Eric Dumazet 

Applied, thanks.

This makes ipv6 consistent with ipv4.

I am surprised this wasn't noticed, for example, in netfilter.
.


Hi David,

This patch is still under working since it may have problem with 
ip_fragment() call, did you applied it already? Should I send a revert 
patch to you then?


Chenbo Feng



Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread David Miller
From: Chenbo Feng 
Date: Fri,  9 Jun 2017 12:06:07 -0700

> From: Chenbo Feng 
> 
> Move the initialization of skb->dev and skb->protocol from
> ip6_finish_output2 to ip6_output. This can make the skb->dev and
> skb->protocol information avalaible to the CGROUP eBPF filter.
> 
> Signed-off-by: Chenbo Feng 
> Acked-by: Eric Dumazet 

Applied, thanks.

This makes ipv6 consistent with ipv4.

I am surprised this wasn't noticed, for example, in netfilter.


[PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-09 Thread Chenbo Feng
From: Chenbo Feng 

Move the initialization of skb->dev and skb->protocol from
ip6_finish_output2 to ip6_output. This can make the skb->dev and
skb->protocol information avalaible to the CGROUP eBPF filter.

Signed-off-by: Chenbo Feng 
Acked-by: Eric Dumazet 
---
 net/ipv6/ip6_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index bf8a58a..02cd44f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -67,9 +67,6 @@ static int ip6_finish_output2(struct net *net, struct sock 
*sk, struct sk_buff *
struct in6_addr *nexthop;
int ret;
 
-   skb->protocol = htons(ETH_P_IPV6);
-   skb->dev = dev;
-
if (ipv6_addr_is_multicast(_hdr(skb)->daddr)) {
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
@@ -154,6 +151,9 @@ int ip6_output(struct net *net, struct sock *sk, struct 
sk_buff *skb)
struct net_device *dev = skb_dst(skb)->dev;
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 
+   skb->protocol = htons(ETH_P_IPV6);
+   skb->dev = dev;
+
if (unlikely(idev->cnf.disable_ipv6)) {
IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
kfree_skb(skb);
-- 
2.7.4



Re: [PATCH] net: Fix inconsistent teardown and release of private netdev state.

2017-06-09 Thread David Miller
From: Stephen Hemminger 
Date: Fri, 9 Jun 2017 10:21:04 -0700

> Is there anything in Documentation/networking/netdevices.txt about this to
> avoid any future issues?

You asked me about this last time, and I did not forget about it.

I sincerely lack the time to do a writeup about it, and I felt that
delaying the fix for another week or two until I find that magical
non-existing timeframe to write the docs was not beneficial for
users at all.


Re: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

2017-06-09 Thread Krister Johansen
On Fri, Jun 09, 2017 at 11:18:44AM -0700, Cong Wang wrote:
> On Thu, Jun 8, 2017 at 1:12 PM, Krister Johansen
>  wrote:
> > The way this works is that if there's still a reference on the dst entry
> > at the time we try to free it, it gets placed in the gc list by
> > __dst_free and the dst_destroy() call is invoked by the gc task once the
> > refcount is 0.  If the gc task processes a 10th or less of its entries
> > on a single pass, it inreases the amount of time it waits between gc
> > intervals.
> >
> > Looking at the gc_task intervals, they started at 663ms when we invoked
> > __dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
> > 10751, and 14848.  The release that set the refcnt to 0 on our dst entry
> > occurred after the gc_task was enqueued for 14 second interval so we had
> > to wait longer than the warning time in wait_allrefs in order for the
> > dst entry to get free'd and the hold on 'lo' to be released.
> >
> 
> I am glad to see you don't have a dst leak here.
> 
> But from my experience of a similar bug (refcnt wait on lo), this goes
> infinitely rather than just 14sec, so it looked more like a real leak than
> just a gc delay. So in your case, this annoying warning eventually
> disappears, right?

That's correct.  The problem occurs intermittently, and the warnings are
less frequent than the interval in netdev_wait_allrefs().  At least when
I observed it, it tended to conincide with our controlplane canary
issuing an API call that lead to a network namespace teardown on the
dataplane.

Sometimes, the message would look like this:

  unregister_netdevice: waiting for lo to become free. Usage count = 0

The dst entries were getting released, it's just that often our dst
cache gc interval was longer than the warning interval in wait_allrefs.

The other concern was that because the wait_allrefs happens in the
netdev_todo path, a long gc interval can cause the rtnl_lock hold times
to be much longer than necessary if this bug is encountered.

-K


Re: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

2017-06-09 Thread Cong Wang
On Thu, Jun 8, 2017 at 1:12 PM, Krister Johansen
 wrote:
> After looking through the list of callbacks that the netdevice notifiers
> invoke in this path, it appears that the dst_dev_event is the most
> interesting.  The dst_ifdown path places a hold on the loopback_dev as
> part of releasing the dev associated with the original dst cache entry.
> Most of our notifier callbacks are straight-forward, but this one a)
> looks complex, and b) places a hold on the network interface in
> question.
>
> I constructed a new bcc script that watches various events in the
> liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
> the loopback device until the invalidated dst entry gets freed.
>

Yeah, this is what I observed when Kevin (Cc'ed) reported a similar
(if not same) bug, I thought we have a refcnt leak on dst.

...
> The way this works is that if there's still a reference on the dst entry
> at the time we try to free it, it gets placed in the gc list by
> __dst_free and the dst_destroy() call is invoked by the gc task once the
> refcount is 0.  If the gc task processes a 10th or less of its entries
> on a single pass, it inreases the amount of time it waits between gc
> intervals.
>
> Looking at the gc_task intervals, they started at 663ms when we invoked
> __dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
> 10751, and 14848.  The release that set the refcnt to 0 on our dst entry
> occurred after the gc_task was enqueued for 14 second interval so we had
> to wait longer than the warning time in wait_allrefs in order for the
> dst entry to get free'd and the hold on 'lo' to be released.
>

I am glad to see you don't have a dst leak here.

But from my experience of a similar bug (refcnt wait on lo), this goes
infinitely rather than just 14sec, so it looked more like a real leak than
just a gc delay. So in your case, this annoying warning eventually
disappears, right?


Thanks.


Re: [PATCH net] ipv4: igmp: fix a use after free

2017-06-09 Thread Xin Long
On Sat, Jun 10, 2017 at 1:01 AM, Cong Wang  wrote:

> On Fri, Jun 9, 2017 at 8:56 AM, Eric Dumazet  wrote:
>> On Fri, 2017-06-09 at 14:24 +0800, Xin Long wrote:
>>> On Fri, Jun 9, 2017 at 8:59 AM, Cong Wang  wrote:
>>>
>>> > On Thu, Jun 8, 2017 at 1:33 PM, Eric Dumazet  
>>> > wrote:
>>> >> I mentioned (in https://lkml.org/lkml/2017/5/31/619 ) that we might need
>>> >> to defer freeing after rcu grace period but for some reason decided it
>>> >> was not needed.
>>> Yes, this one could fix it.
>>>
>>> >
>>> > This one makes sense, it is the second time I saw the use-after-free
>>> > in igmp code, both are because we don't respect the RCU rule to free
>>> > an element in the list.
>>> >
>>> >>
>>> >> What about :
>>> >
>>> > But not sure if all ip_ma_put() callers want ip_mc_clear_src().
>>> If that's problem, there may be another way:
>>>
>>>   leave ip_mc_clear_src as it is, just add pmc->lock to protect this call.
>>>
>>> this use-after-free was actually caused by using pmc->sources/tomb
>>> in add_grec while ip_mc_clear_src is freeing them. add_grec is already
>>> under pmc->lock, so to add pmc->lock for ip_mc_clear_src should be
>>> enough to protect the list pmc->sources/tomb.
>>>
>>> wdyt ?
>>
>> This would we weird.
>>
>> When we free skb components, we do not grab a spinlock.
>>
>> When we free something, just make sure we must be the last user of it.
>>
>> RCU rules -> Must respect RCU grace period before delete.
>>
>> No need for extra spinlock.
>
> This is what I thought in my first response, until I realized
> it is not pure RCU, otherwise pmc->lock should not be taken
> in igmpv3_send_cr(). It seems the code is mixing the use
> of spinlock and RCU.
rcu lock is for pmc not being freed, and spinlock is for pmc's
members' modification. is there some rule these two locks
should be mixed?


>
> We need RCU anyway, ip_check_mc_rcu() is the real fast
> path where we don't take spinlock. I think we will need more
> work.
It seems all add_grec() callings needs spinlock, maybe  add_grec
modifies pmc's member. it's hard to drop spinlock.

from ip_check_mc_rcu you mentioned about, it should be right
to call ip_mc_clear_src after rcu grace period, like Eric's patch.


Re: [PATCH net] netlink: don't send unknown nsid

2017-06-09 Thread Flavio Leitner
On Thu, Jun 08, 2017 at 10:31:53AM +0200, Nicolas Dichtel wrote:
> Le 07/06/2017 à 21:14, Flavio Leitner a écrit :
> > Let's say the app is restarted, or another monitoring app is executed
> > with enough perms.  How will it identify the error condition?
> Your app wants to monitor a subset of netns. It means that you already have a
> way to identify those netns, something like a file stored somewhere
> (/var/run/netns/, /proc//ns/net, ...). Thus, it's easy to check if those
> netns have a nsid assigned in the netns where your app will open the socket.
> 
> This option was called NETLINK_F_LISTEN_ALL_NSID, because it only enables to
> listen netns *with* a nsid assigned, nothing more. It's up to the user to 
> ensure
> that nsid are correctly assigned.

Makes sense, thanks.
-- 
Flavio



Re: [PATCH v2 net-next] Ipvlan should return an error when an address is already in use.

2017-06-09 Thread Krister Johansen
On Fri, Jun 09, 2017 at 01:15:10PM -0400, David Miller wrote:
> From: Krister Johansen 
> Date: Fri, 9 Jun 2017 10:13:10 -0700
> 
> > On Fri, Jun 09, 2017 at 12:26:46PM -0400, David Miller wrote:
> >> From: Krister Johansen 
> >> Date: Thu, 8 Jun 2017 13:12:14 -0700
> >> 
> >> > The ipvlan code already knows how to detect when a duplicate address is
> >> > about to be assigned to an ipvlan device.  However, that failure is not
> >> > propogated outward and leads to a silent failure.
> >> > 
> >> > Introduce a validation step at ip address creation time and allow device
> >> > drivers to register to validate the incoming ip addresses.  The ipvlan
> >> > code is the first consumer.  If it detects an address in use, we can
> >> > return an error to the user before beginning to commit the new ifa in
> >> > the networking code.
> >> > 
> >> > This can be especially useful if it is necessary to provision many
> >> > ipvlans in containers.  The provisioning software (or operator) can use
> >> > this to detect situations where an ip address is unexpectedly in use.
> >> > 
> >> > Signed-off-by: Krister Johansen 
> >> 
> >> Ok, applied, thank you.
> > 
> > Thanks, did this look otherwise alright?
> 
> Yes, I was mildly unsatisfied with the ipv6 addrconf situation but I know
> very well about that, and those kinds of addresses aren't of interest
> for what you are trying to achieve right?

That's correct.  I'm basically trying to catch the case where 'ip addr'
or its equivalent rtnetlink invocation manually configure an ip address
on a ipvlan.

-K


Re: [PATCH] net: Fix inconsistent teardown and release of private netdev state.

2017-06-09 Thread Stephen Hemminger
On Wed, 07 Jun 2017 15:54:11 -0400 (EDT)
David Miller  wrote:

> Network devices can allocate reasources and private memory using
> netdev_ops->ndo_init().  However, the release of these resources
> can occur in one of two different places.
> 
> Either netdev_ops->ndo_uninit() or netdev->destructor().
> 
> The decision of which operation frees the resources depends upon
> whether it is necessary for all netdev refs to be released before it
> is safe to perform the freeing.
> 
> netdev_ops->ndo_uninit() presumably can occur right after the
> NETDEV_UNREGISTER notifier completes and the unicast and multicast
> address lists are flushed.
> 
> netdev->destructor(), on the other hand, does not run until the
> netdev references all go away.
> 
> Further complicating the situation is that netdev->destructor()
> almost universally does also a free_netdev().
> 
> This creates a problem for the logic in register_netdevice().
> Because all callers of register_netdevice() manage the freeing
> of the netdev, and invoke free_netdev(dev) if register_netdevice()
> fails.
> 
> If netdev_ops->ndo_init() succeeds, but something else fails inside
> of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
> it is not able to invoke netdev->destructor().
> 
> This is because netdev->destructor() will do a free_netdev() and
> then the caller of register_netdevice() will do the same.
> 
> However, this means that the resources that would normally be released
> by netdev->destructor() will not be.
> 
> Over the years drivers have added local hacks to deal with this, by
> invoking their destructor parts by hand when register_netdevice()
> fails.
> 
> Many drivers do not try to deal with this, and instead we have leaks.
> 
> Let's close this hole by formalizing the distinction between what
> private things need to be freed up by netdev->destructor() and whether
> the driver needs unregister_netdevice() to perform the free_netdev().
> 
> netdev->priv_destructor() performs all actions to free up the private
> resources that used to be freed by netdev->destructor(), except for
> free_netdev().
> 
> netdev->needs_free_netdev is a boolean that indicates whether
> free_netdev() should be done at the end of unregister_netdevice().
> 
> Now, register_netdevice() can sanely release all resources after
> ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
> and netdev->priv_destructor().
> 
> And at the end of unregister_netdevice(), we invoke
> netdev->priv_destructor() and optionally call free_netdev().
> 
> Signed-off-by: David S. Miller 
> ---
> 
> This is from a few weeks ago, pushed to 'net' and queued up for
> -stable.
> 
>  drivers/net/bonding/bond_main.c | 6 +++---
>  drivers/net/caif/caif_hsi.c | 2 +-
>  drivers/net/caif/caif_serial.c  | 2 +-
>  drivers/net/caif/caif_spi.c | 2 +-
>  drivers/net/caif/caif_virtio.c  | 2 +-
>  drivers/net/can/slcan.c | 7 +++
>  drivers/net/can/vcan.c  | 2 +-
>  drivers/net/can/vxcan.c | 2 +-
>  drivers/net/dummy.c | 4 ++--
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +-
>  drivers/net/geneve.c| 2 +-
>  drivers/net/gtp.c   | 2 +-
>  drivers/net/hamradio/6pack.c| 2 +-
>  drivers/net/hamradio/bpqether.c | 2 +-
>  drivers/net/ifb.c   | 4 ++--
>  drivers/net/ipvlan/ipvlan_main.c| 2 +-
>  drivers/net/loopback.c  | 4 ++--
>  drivers/net/macsec.c| 4 ++--
>  drivers/net/macvlan.c   | 2 +-
>  drivers/net/nlmon.c | 2 +-
>  drivers/net/slip/slip.c | 7 +++
>  drivers/net/team/team.c | 4 ++--
>  drivers/net/tun.c   | 4 ++--
>  drivers/net/usb/cdc-phonet.c| 2 +-
>  drivers/net/usb/qmi_wwan.c  | 2 +-
>  drivers/net/veth.c  | 4 ++--
>  drivers/net/vrf.c   | 2 +-
>  drivers/net/vsockmon.c  | 2 +-
>  drivers/net/vxlan.c | 2 +-
>  drivers/net/wan/dlci.c  | 2 +-
>  drivers/net/wan/hdlc_fr.c   | 2 +-
>  drivers/net/wan/lapbether.c | 2 +-
>  

Reply for more details

2017-06-09 Thread Mr.David Kekeli
My Greetings,

I am a banker by profession and currently holds the position of Chief
Auditor in our bank, I have the ability to transfer unclaimed funds of
U.S.$ 6.2 m (6.2 million dollars) that belong to one of our late
customer, who bear the same last name with you who died in a car crash
along with his Family and no one came to put claim the funds, if left
unclaimed the fund will be Transferred to the state treasury in the
bank so I invite you to a Deal where we can facilitate the transfer of
this fund, and I promise You 50% as a percentage of the fund will be
transferred into your account, meanwhile here are the needed
information from you for the facilitation of the funds.

1.Your First Name
2.Your Last Name
3.Your Telephone number
4.Your Age
5.Your Country
6.Your occupation
7.Your Email Address


Am waiting for your urgent reply so that we will starts immediately,
Sorry if you received this letter in your spam, Due to recent
connection error here in my country. May Almighty God Bless You!


I wait to hear from you for more details,


Best Regard,

David Kekeli.


Re: [PATCH v2 net-next] Ipvlan should return an error when an address is already in use.

2017-06-09 Thread David Miller
From: Krister Johansen 
Date: Fri, 9 Jun 2017 10:13:10 -0700

> On Fri, Jun 09, 2017 at 12:26:46PM -0400, David Miller wrote:
>> From: Krister Johansen 
>> Date: Thu, 8 Jun 2017 13:12:14 -0700
>> 
>> > The ipvlan code already knows how to detect when a duplicate address is
>> > about to be assigned to an ipvlan device.  However, that failure is not
>> > propogated outward and leads to a silent failure.
>> > 
>> > Introduce a validation step at ip address creation time and allow device
>> > drivers to register to validate the incoming ip addresses.  The ipvlan
>> > code is the first consumer.  If it detects an address in use, we can
>> > return an error to the user before beginning to commit the new ifa in
>> > the networking code.
>> > 
>> > This can be especially useful if it is necessary to provision many
>> > ipvlans in containers.  The provisioning software (or operator) can use
>> > this to detect situations where an ip address is unexpectedly in use.
>> > 
>> > Signed-off-by: Krister Johansen 
>> 
>> Ok, applied, thank you.
> 
> Thanks, did this look otherwise alright?

Yes, I was mildly unsatisfied with the ipv6 addrconf situation but I know
very well about that, and those kinds of addresses aren't of interest
for what you are trying to achieve right?



Re: [PATCH] net: Fix inconsistent teardown and release of private netdev state.

2017-06-09 Thread David Miller
From: Johannes Berg 
Date: Fri, 09 Jun 2017 16:33:47 +0200

> Right. Do you want me to put that into my tree? I could do it tonight,
> or perhaps only Monday though.

Please submit it formally to netdev for me to apply.

I want to queue it up with the original patch for -stable to make
sure all the fallout is addressed.

Thank you.


Re: [PATCH net-next v2] cxgb4: handle interrupt raised when FW crashes

2017-06-09 Thread David Miller
From: Ganesh Goudar 
Date: Fri,  9 Jun 2017 11:12:35 +0530

> From: Rahul Lakkireddy 
> 
> Handle TIMER0INT when FW crashes. Check for PCIE_FW[FW_EVAL]
> and if it says "Device FW Crashed", then treat it as fatal.
> Else, non-fatal.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 
> ---
> v2: Following the reverse chirstmas tree variable ordering

Applied, thank you.


Re: [PATCH v2 net-next] Ipvlan should return an error when an address is already in use.

2017-06-09 Thread Krister Johansen
On Fri, Jun 09, 2017 at 12:26:46PM -0400, David Miller wrote:
> From: Krister Johansen 
> Date: Thu, 8 Jun 2017 13:12:14 -0700
> 
> > The ipvlan code already knows how to detect when a duplicate address is
> > about to be assigned to an ipvlan device.  However, that failure is not
> > propogated outward and leads to a silent failure.
> > 
> > Introduce a validation step at ip address creation time and allow device
> > drivers to register to validate the incoming ip addresses.  The ipvlan
> > code is the first consumer.  If it detects an address in use, we can
> > return an error to the user before beginning to commit the new ifa in
> > the networking code.
> > 
> > This can be especially useful if it is necessary to provision many
> > ipvlans in containers.  The provisioning software (or operator) can use
> > this to detect situations where an ip address is unexpectedly in use.
> > 
> > Signed-off-by: Krister Johansen 
> 
> Ok, applied, thank you.

Thanks, did this look otherwise alright?  I was a little nervous about
dropping and re-acquiring the rcu_read_lock_bh() in net/ipv6/addrconf.c
around line 975, but in the current design holding rcu_read_lock_bh()
causes the in_softirq() check in the validator (and the add/remove
ipvlan code itself) to return NOTIFY_DONE immediately.

AFAICT, the rcu_read_lock was to protect the idev.  I changed that to
get subsequently released in the outbound path.  I was also unsure if
it's safe to call in6_dev_put from a bh context.

Thanks,

-K


Re: [PATCH net] ipv4: igmp: fix a use after free

2017-06-09 Thread Cong Wang
On Fri, Jun 9, 2017 at 8:56 AM, Eric Dumazet  wrote:
> On Fri, 2017-06-09 at 14:24 +0800, Xin Long wrote:
>> On Fri, Jun 9, 2017 at 8:59 AM, Cong Wang  wrote:
>>
>> > On Thu, Jun 8, 2017 at 1:33 PM, Eric Dumazet  
>> > wrote:
>> >> I mentioned (in https://lkml.org/lkml/2017/5/31/619 ) that we might need
>> >> to defer freeing after rcu grace period but for some reason decided it
>> >> was not needed.
>> Yes, this one could fix it.
>>
>> >
>> > This one makes sense, it is the second time I saw the use-after-free
>> > in igmp code, both are because we don't respect the RCU rule to free
>> > an element in the list.
>> >
>> >>
>> >> What about :
>> >
>> > But not sure if all ip_ma_put() callers want ip_mc_clear_src().
>> If that's problem, there may be another way:
>>
>>   leave ip_mc_clear_src as it is, just add pmc->lock to protect this call.
>>
>> this use-after-free was actually caused by using pmc->sources/tomb
>> in add_grec while ip_mc_clear_src is freeing them. add_grec is already
>> under pmc->lock, so to add pmc->lock for ip_mc_clear_src should be
>> enough to protect the list pmc->sources/tomb.
>>
>> wdyt ?
>
> This would we weird.
>
> When we free skb components, we do not grab a spinlock.
>
> When we free something, just make sure we must be the last user of it.
>
> RCU rules -> Must respect RCU grace period before delete.
>
> No need for extra spinlock.

This is what I thought in my first response, until I realized
it is not pure RCU, otherwise pmc->lock should not be taken
in igmpv3_send_cr(). It seems the code is mixing the use
of spinlock and RCU.

We need RCU anyway, ip_check_mc_rcu() is the real fast
path where we don't take spinlock. I think we will need more
work.


Re: [PATCH net-next 0/5] nfp: FW app build name reporting

2017-06-09 Thread David Miller
From: Jakub Kicinski 
Date: Thu,  8 Jun 2017 20:56:09 -0700

> This series adds reporting FW build name in ethtool -i.  Most
> of the patches are restructuring where information caching is
> done.  There is also a minor error path fix.
> 
> These are last few patches finishing the basic nfp_app support.

Series applied, thank you.


Re: [PATCH net-next] liquidio: disallow enabling firmware debug from a VF

2017-06-09 Thread David Miller
From: Felix Manlunas 
Date: Thu, 8 Jun 2017 19:20:36 -0700

> From: Derek Chickles 
> 
> Disallow enabling firmware debug from a VF.  Only PF is allowed to do that.
> 
> Signed-off-by: Derek Chickles 
> Signed-off-by: Felix Manlunas 

Applied.


Re: [PATCH net-next v2] geneve: add missing rx stats accounting

2017-06-09 Thread David Miller
From: Girish Moodalbail 
Date: Thu,  8 Jun 2017 17:07:48 -0700

> There are few places on the receive path where packet drops and packet
> errors were not accounted for. This patch fixes that issue.
> 
> Signed-off-by: Girish Moodalbail 

Applied, thank you.


Re: [net-bluetooth] question about potential null pointer dereference

2017-06-09 Thread Marcel Holtmann
Hi Gustavo,

 While looking into Coverity ID 1357456 I ran into the following piece of 
 code at net/bluetooth/smp.c:166
 
 166/* The following functions map to the LE SC SMP crypto functions
 167 * AES-CMAC, f4, f5, f6, g2 and h6.
 168 */
 169
 170static int aes_cmac(struct crypto_shash *tfm, const u8 k[16], const u8 
 *m,
 171size_t len, u8 mac[16])
 172{
 173uint8_t tmp[16], mac_msb[16], msg_msb[CMAC_MSG_MAX];
 174SHASH_DESC_ON_STACK(desc, tfm);
 175int err;
 176
 177if (len > CMAC_MSG_MAX)
 178return -EFBIG;
 179
 180if (!tfm) {
 181BT_ERR("tfm %p", tfm);
 182return -EINVAL;
 183}
 184
> 
> BTW, what do you think about removing the IF block above?

what do you mean by this?

 185desc->tfm = tfm;
 186desc->flags = 0;
 187
 188/* Swap key and message from LSB to MSB */
 189swap_buf(k, tmp, 16);
 190swap_buf(m, msg_msb, len);
 191
 192SMP_DBG("msg (len %zu) %*phN", len, (int) len, m);
 193SMP_DBG("key %16phN", k);
 194
 195err = crypto_shash_setkey(tfm, tmp, 16);
 196if (err) {
 197BT_ERR("cipher setkey failed: %d", err);
 198return err;
 199}
 200
 201err = crypto_shash_digest(desc, msg_msb, len, mac_msb);
 202shash_desc_zero(desc);
 203if (err) {
 204BT_ERR("Hash computation error %d", err);
 205return err;
 206}
 207
 208swap_buf(mac_msb, mac, 16);
 209
 210SMP_DBG("mac %16phN", mac);
 211
 212return 0;
 213}
 
 The issue here is that line 180 implies that pointer tfm might be NULL. If 
 this is the case, there is a potential NULL pointer dereference at line 
 174 once pointer tfm is indirectly dereferenced inside macro 
 SHASH_DESC_ON_STACK().
 
 My question is if there is any chance that pointer tfm maybe be NULL when 
 calling macro SHASH_DESC_ON_STACK()?
>>> 
>>> I think the part you are after is this:
>>> 
>>>   smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
>>>   if (IS_ERR(smp->tfm_cmac)) {
>>>   BT_ERR("Unable to create CMAC crypto context");
>>>   crypto_free_cipher(smp->tfm_aes);
>>>   kzfree(smp);
>>>   return NULL;
>>>   }
>>> 
>> 
>> Yeah, this makes it all clear.
>> 
>>> So the tfm_cmac is part of the smp structure. However if there is no 
>>> cipher, we destroy the smp structure and essentially run without SMP 
>>> support. So it can not really be called anyway.
>>> 
>> 
>> What I take from this is that as a general rule, I should first try to 
>> identify whether the code I'm debugging is reachable or not, depending on 
>> the specific structures and variables I'm interested in.
>> 
>>> Maybe commenting this might be a good idea.
>>> 
>> 
>> Yep, it wouldn't hurt.

Patches are welcome :)

Regards

Marcel



Re: [PATCH 2/2(net.git)] stmmac: fix for hw timestamp of GMAC3 unit

2017-06-09 Thread David Miller
From: "Mario Molitor" 
Date: Thu, 8 Jun 2017 23:35:02 +0200

> From d5c520880a5f6b470cb150b9aae67341089b9395 Mon Sep 17 00:00:00 2001
> From: Mario Molitor 
> Date: Thu, 8 Jun 2017 23:03:09 +0200
> Subject: [PATCH 2/2] stmmac: fix for hw timestamp of GMAC3 unit
> 
> 1.) Bugfix of function stmmac_get_tx_hwtstamp.
> Corrected the tx timestamp available check (same as 4.8 and older)
> Change printout from info syslevel to debug.
> 
> 2.) Bugfix of function stmmac_get_rx_hwtstamp.
> Corrected the rx timestamp available check (same as 4.8 and older)
> Change printout from info syslevel to debug.
> 
> Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4")
> Signed-off-by: Mario Molitor 

Applied.


Re: [PATCH 1/2(net.git)] stmmac: fix ptp header for GMAC3 hw timestamp

2017-06-09 Thread David Miller
From: "Mario Molitor" 
Date: Thu, 8 Jun 2017 23:31:13 +0200

> From ce9c334037fce37ccd715124cda57d1fd6d8cfe8 Mon Sep 17 00:00:00 2001
> From: Mario Molitor 
> Date: Thu, 8 Jun 2017 22:41:02 +0200
> Subject: [PATCH 1/2] stmmac: fix ptp header for GMAC3 hw timestamp
> 
> According the CYCLON V documention only the bit 16 of snaptypesel should
> set.
> (more information see Table 17-20 (cv_5v4.pdf) :
>  Timestamp Snapshot Dependency on Register Bits)
> 
> Fixes: d2042052a0aa ("stmmac: update the PTP header file")
> Signed-off-by: Mario Molitor 

Applied.


Re: [PATCH] sh_eth: add support to change MTU

2017-06-09 Thread Sergei Shtylyov

Hello!

On 06/09/2017 06:30 PM, Niklas Söderlund wrote:


The hardware supports the MTU to be changed and the driver it self is
somewhat prepared to support this. This patch hooks up the callbacks to
be able to change the MTU from user-space.

Signed-off-by: Niklas Söderlund 

[...]

   One more thing off my back, thanks! :-)
   I'm OK with this patch in principle (but have several nits):

Acked-by: Sergei Shtylyov 


diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index f68c4db656eda846..da41eda7bfada6b9 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c

[...]

@@ -3171,6 +3184,13 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
}
sh_eth_set_default_cpu_data(mdp->cd);

+   /* Datasheet states max MTU should be 2048 but due to the


   User's manual. :-)
   Somehow I thought it supports jumbo frames but the manual doesn't confirm 
that... ah, that's EtherAVB! :-)



+* aliment calculations in sh_eth_ring_init() the practical


  Alignment.


+* MTU is a bit less. Maybe this can be optimized some more.


   Undoubtedly... :-)

[...]

MBR, Sergei



Re: [PATCH net-next v2 3/3] udp: try to avoid 2 cache miss on dequeue

2017-06-09 Thread David Miller
From: Paolo Abeni 
Date: Fri, 09 Jun 2017 17:44:29 +0200

> I'll re-submit v3 unchanged, if there are no objections.

No objections from me.


Re: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

2017-06-09 Thread David Miller
From: Krister Johansen 
Date: Thu, 8 Jun 2017 13:12:38 -0700

> It looks like this:
> 
> Message from syslogd@flamingo at Apr 26 00:45:00 ...
>  kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
> 
> They seem to coincide with net namespace teardown.
> 
> The message is emitted by netdev_wait_allrefs().
> 
> Forced a kdump in netdev_run_todo, but found that the refcount on the lo
> device was already 0 at the time we got to the panic.
> 
> Used bcc to check the blocking in netdev_run_todo.  The only places
> where we're off cpu there are in the rcu_barrier() and msleep() calls.
> That behavior is expected.  The msleep time coincides with the amount of
> time we spend waiting for the refcount to reach zero; the rcu_barrier()
> wait times are not excessive.
> 
> After looking through the list of callbacks that the netdevice notifiers
> invoke in this path, it appears that the dst_dev_event is the most
> interesting.  The dst_ifdown path places a hold on the loopback_dev as
> part of releasing the dev associated with the original dst cache entry.
> Most of our notifier callbacks are straight-forward, but this one a)
> looks complex, and b) places a hold on the network interface in
> question.
> 
> I constructed a new bcc script that watches various events in the
> liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
> the loopback device until the invalidated dst entry gets freed.
 ...
> The way this works is that if there's still a reference on the dst entry
> at the time we try to free it, it gets placed in the gc list by
> __dst_free and the dst_destroy() call is invoked by the gc task once the
> refcount is 0.  If the gc task processes a 10th or less of its entries
> on a single pass, it inreases the amount of time it waits between gc
> intervals.
> 
> Looking at the gc_task intervals, they started at 663ms when we invoked
> __dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
> 10751, and 14848.  The release that set the refcnt to 0 on our dst entry
> occurred after the gc_task was enqueued for 14 second interval so we had
> to wait longer than the warning time in wait_allrefs in order for the
> dst entry to get free'd and the hold on 'lo' to be released.
> 
> A simple solution to this problem is to have dst_dev_event() reset the
> gc timer, which causes us to process this list shortly after the
> gc_mutex is relased when dst_dev_event() completes.
> 
> Signed-off-by: Krister Johansen 

Yeah this is one of the more unsatisfying areas of dst and device handling
in the tree, thanks for working on this.

Applied and queued up for -stable, thanks again.


Re: [PATCH v2 net-next] Ipvlan should return an error when an address is already in use.

2017-06-09 Thread David Miller
From: Krister Johansen 
Date: Thu, 8 Jun 2017 13:12:14 -0700

> The ipvlan code already knows how to detect when a duplicate address is
> about to be assigned to an ipvlan device.  However, that failure is not
> propogated outward and leads to a silent failure.
> 
> Introduce a validation step at ip address creation time and allow device
> drivers to register to validate the incoming ip addresses.  The ipvlan
> code is the first consumer.  If it detects an address in use, we can
> return an error to the user before beginning to commit the new ifa in
> the networking code.
> 
> This can be especially useful if it is necessary to provision many
> ipvlans in containers.  The provisioning software (or operator) can use
> this to detect situations where an ip address is unexpectedly in use.
> 
> Signed-off-by: Krister Johansen 

Ok, applied, thank you.


Re: [PATCH v2 7/8] net: mvmdio: add xmdio support

2017-06-09 Thread Antoine Tenart
On Fri, Jun 09, 2017 at 05:03:40PM +0200, Andrew Lunn wrote:
> > There are two busses, one generating c22 transactions and one generating
> > c45 transactions. Each bus has its own MDC/MDIO pins.
> 
> O.K. That is what i wanted to know. So we want two completely separate
> device tree bindings, busses registered with Linux, etc.
> 
> Thanks for clarification.

So in the end I need one change in v3: to bind the xSMI usage to
marvell,xmdio and the SMI one to marvell,orion-mdio. (Plus the GENMASK
and offset comments).

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


  1   2   3   >