date:20161030

[PATCH 1/1] xen-netfront: do not cast grant table reference to signed short

2016-10-30 Thread Dongli Zhang

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.

Signed-off-by: Dongli Zhang 
---
 drivers/net/xen-netfront.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index e17879d..189a28d 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -304,7 +304,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
queue->rx_skbs[id] = skb;
 
ref = gnttab_claim_grant_reference(>gref_rx_head);
-   BUG_ON((signed short)ref < 0);
+   WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));
queue->grant_rx_ref[id] = ref;
 
page = skb_frag_page(_shinfo(skb)->frags[0]);
@@ -428,7 +428,7 @@ static void xennet_tx_setup_grant(unsigned long gfn, 
unsigned int offset,
id = get_id_from_freelist(>tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(>tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_grant_reference(>gref_tx_head);
-   BUG_ON((signed short)ref < 0);
+   WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));
 
gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
gfn, GNTMAP_readonly);
-- 
2.7.4

[PATCH net-next v2 3/7] qed*: Add support for WoL

2016-10-30 Thread Yuval Mintz

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   | 11 -
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 19 -
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |  4 ++
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 29 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 56 -
 drivers/net/ethernet/qlogic/qede/qede.h |  2 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 41 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c|  9 
 include/linux/qed/qed_if.h  | 10 +
 9 files changed, 176 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index f20243c..8828ffa 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -195,6 +195,11 @@ enum qed_dev_cap {
QED_DEV_CAP_ROCE,
 };
 
+enum qed_wol_support {
+   QED_WOL_SUPPORT_NONE,
+   QED_WOL_SUPPORT_PME,
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -227,6 +232,8 @@ struct qed_hw_info {
u32 hw_mode;
unsigned long   device_capabilities;
u16 mtu;
+
+   enum qed_wol_support b_wol_support;
 };
 
 struct qed_hw_cid_data {
@@ -539,7 +546,9 @@ struct qed_dev {
u8  mcp_rev;
u8  boot_mode;
 
-   u8  wol;
+   /* WoL related configurations */
+   u8 wol_config;
+   u8 wol_mac[ETH_ALEN];
 
u32 int_mode;
enum qed_coalescing_modeint_coalescing_mode;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 33fd69e..127ed5f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1364,8 +1364,24 @@ int qed_hw_reset(struct qed_dev *cdev)
 {
int rc = 0;
u32 unload_resp, unload_param;
+   u32 wol_param;
int i;
 
+   switch (cdev->wol_config) {
+   case QED_OV_WOL_DISABLED:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_DISABLED;
+   break;
+   case QED_OV_WOL_ENABLED:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_ENABLED;
+   break;
+   default:
+   DP_NOTICE(cdev,
+ "Unknown WoL configuration %02x\n", cdev->wol_config);
+   /* Fallthrough */
+   case QED_OV_WOL_DEFAULT:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_MCP;
+   }
+
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
@@ -1394,8 +1410,7 @@ int qed_hw_reset(struct qed_dev *cdev)
 
/* Send unload command to MCP */
rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt,
-DRV_MSG_CODE_UNLOAD_REQ,
-DRV_MB_PARAM_UNLOAD_WOL_MCP,
+DRV_MSG_CODE_UNLOAD_REQ, wol_param,
 _resp, _param);
if (rc) {
DP_NOTICE(p_hwfn, "qed_hw_reset: UNLOAD_REQ failed\n");
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index f7dfa2e..fdb7a09 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8601,6 +8601,7 @@ struct public_drv_mb {
 
 #define DRV_MSG_CODE_BIST_TEST 0x001e
 #define DRV_MSG_CODE_SET_LED_MODE  0x0020
+#define DRV_MSG_CODE_OS_WOL0x002e
 
 #define DRV_MSG_SEQ_NUMBER_MASK0x
 
@@ -8697,6 +8698,9 @@ struct public_drv_mb {
 #define FW_MSG_CODE_NVM_OK 0x0001
 #define FW_MSG_CODE_OK 0x0016
 
+#define FW_MSG_CODE_OS_WOL_SUPPORTED0x0080
+#define FW_MSG_CODE_OS_WOL_NOT_SUPPORTED0x0081
+
 #define FW_MSG_SEQ_NUMBER_MASK 0x
 
u32 fw_mb_param;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 31f8e42..b71d73a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -221,6 +221,10 @@ int qed_fill_dev_info(struct qed_dev *cdev,
dev_info->fw_eng = FW_ENGINEERING_VERSION;
dev_info->mf_mode = cdev->mf_mode;
dev_info->tx_switching = true;
+
+   if (QED_LEADING_HWFN(cdev)->hw_info.b_wol_support ==
+   QED_WOL_SUPPORT_PME)
+   dev_info->wol_support = true;
} else {
qed_vf_get_fw_version(>hwfns[0], _info->fw_major,

[PATCH net-next v2 6/7] qed: Use VF-queue feature

2016-10-30 Thread Yuval Mintz

Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.

This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   |  1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 20 ++
 drivers/net/ethernet/qlogic/qed/qed_int.c   | 32 -
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 ++-
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 8828ffa..6d3013f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -174,6 +174,7 @@ enum QED_FEATURE {
QED_PF_L2_QUE,
QED_VF,
QED_RDMA_CNQ,
+   QED_VF_L2_QUE,
QED_MAX_FEATURES,
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 127ed5f..d996afe 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1476,6 +1476,7 @@ static void get_function_id(struct qed_hwfn *p_hwfn)
 static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
 {
u32 *feat_num = p_hwfn->hw_info.feat_num;
+   struct qed_sb_cnt_info sb_cnt_info;
int num_features = 1;
 
if (IS_ENABLED(CONFIG_QED_RDMA) &&
@@ -1494,10 +1495,21 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
feat_num[QED_PF_L2_QUE] = min_t(u32, RESC_NUM(p_hwfn, QED_SB) /
num_features,
RESC_NUM(p_hwfn, QED_L2_QUEUE));
-   DP_VERBOSE(p_hwfn, NETIF_MSG_PROBE,
-  "#PF_L2_QUEUES=%d #SBS=%d num_features=%d\n",
-  feat_num[QED_PF_L2_QUE], RESC_NUM(p_hwfn, QED_SB),
-  num_features);
+
+   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   qed_int_get_num_sbs(p_hwfn, _cnt_info);
+   feat_num[QED_VF_L2_QUE] =
+   min_t(u32,
+ RESC_NUM(p_hwfn, QED_L2_QUEUE) -
+ FEAT_NUM(p_hwfn, QED_PF_L2_QUE), sb_cnt_info.sb_iov_cnt);
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_PROBE,
+  "#PF_L2_QUEUES=%d VF_L2_QUEUES=%d #ROCE_CNQ=%d #SBS=%d 
num_features=%d\n",
+  (int)FEAT_NUM(p_hwfn, QED_PF_L2_QUE),
+  (int)FEAT_NUM(p_hwfn, QED_VF_L2_QUE),
+  (int)FEAT_NUM(p_hwfn, QED_RDMA_CNQ),
+  RESC_NUM(p_hwfn, QED_SB), num_features);
 }
 
 static int qed_hw_get_resc(struct qed_hwfn *p_hwfn)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c 
b/drivers/net/ethernet/qlogic/qed/qed_int.c
index 2adedc6..bb74e1c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -3030,6 +3030,31 @@ int qed_int_igu_read_cam(struct qed_hwfn *p_hwfn, struct 
qed_ptt *p_ptt)
}
}
}
+
+   /* There's a possibility the igu_sb_cnt_iov doesn't properly reflect
+* the number of VF SBs [especially for first VF on engine, as we can't
+* diffrentiate between empty entries and its entries].
+* Since we don't really support more SBs than VFs today, prevent any
+* such configuration by sanitizing the number of SBs to equal the
+* number of VFs.
+*/
+   if (IS_PF_SRIOV(p_hwfn)) {
+   u16 total_vfs = p_hwfn->cdev->p_iov_info->total_vfs;
+
+   if (total_vfs < p_igu_info->free_blks) {
+   DP_VERBOSE(p_hwfn,
+  (NETIF_MSG_INTR | QED_MSG_IOV),
+  "Limiting number of SBs for IOV - %04x --> 
%04x\n",
+  p_igu_info->free_blks,
+  p_hwfn->cdev->p_iov_info->total_vfs);
+   p_igu_info->free_blks = total_vfs;
+   } else if (total_vfs > p_igu_info->free_blks) {
+   DP_NOTICE(p_hwfn,
+ "IGU has only %04x SBs for VFs while the 
device has %04x VFs\n",
+ p_igu_info->free_blks, total_vfs);
+   return -EINVAL;
+   }
+   }
p_igu_info->igu_sb_cnt_iov = p_igu_info->free_blks;
 
DP_VERBOSE(
@@ -3163,7 +3188,12 @@ u16 qed_int_queue_id_from_sb_id(struct qed_hwfn *p_hwfn, 
u16 sb_id)
return sb_id - p_info->igu_base_sb;
} else if ((sb_id >= p_info->igu_base_sb_iov) &&
   (sb_id <

[PATCH net-next v2 7/7] qed: Learn resources from management firmware

2016-10-30 Thread Yuval Mintz

From: Tomer Tayar 

Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.

Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].

Signed-off-by: Tomer Tayar 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 291 --
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  46 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |  42 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |  15 ++
 include/linux/qed/qed_eth_if.h|   2 +-
 7 files changed, 341 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 6d3013f..50b8a01 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -154,7 +154,10 @@ struct qed_qm_iids {
u32 tids;
 };
 
-enum QED_RESOURCES {
+/* HW / FW resources, output of features supported below, most information
+ * is received from MFW.
+ */
+enum qed_resources {
QED_SB,
QED_L2_QUEUE,
QED_VPORT,
@@ -166,6 +169,7 @@ enum QED_RESOURCES {
QED_RDMA_CNQ_RAM,
QED_ILT,
QED_LL2_QUEUE,
+   QED_CMDQS_CQS,
QED_RDMA_STATS_QUEUE,
QED_MAX_RESC,
 };
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index d996afe..5be7b8a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1512,47 +1512,240 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
   RESC_NUM(p_hwfn, QED_SB), num_features);
 }
 
-static int qed_hw_get_resc(struct qed_hwfn *p_hwfn)
+static enum resource_id_enum qed_hw_get_mfw_res_id(enum qed_resources res_id)
+{
+   enum resource_id_enum mfw_res_id = RESOURCE_NUM_INVALID;
+
+   switch (res_id) {
+   case QED_SB:
+   mfw_res_id = RESOURCE_NUM_SB_E;
+   break;
+   case QED_L2_QUEUE:
+   mfw_res_id = RESOURCE_NUM_L2_QUEUE_E;
+   break;
+   case QED_VPORT:
+   mfw_res_id = RESOURCE_NUM_VPORT_E;
+   break;
+   case QED_RSS_ENG:
+   mfw_res_id = RESOURCE_NUM_RSS_ENGINES_E;
+   break;
+   case QED_PQ:
+   mfw_res_id = RESOURCE_NUM_PQ_E;
+   break;
+   case QED_RL:
+   mfw_res_id = RESOURCE_NUM_RL_E;
+   break;
+   case QED_MAC:
+   case QED_VLAN:
+   /* Each VFC resource can accommodate both a MAC and a VLAN */
+   mfw_res_id = RESOURCE_VFC_FILTER_E;
+   break;
+   case QED_ILT:
+   mfw_res_id = RESOURCE_ILT_E;
+   break;
+   case QED_LL2_QUEUE:
+   mfw_res_id = RESOURCE_LL2_QUEUE_E;
+   break;
+   case QED_RDMA_CNQ_RAM:
+   case QED_CMDQS_CQS:
+   /* CNQ/CMDQS are the same resource */
+   mfw_res_id = RESOURCE_CQS_E;
+   break;
+   case QED_RDMA_STATS_QUEUE:
+   mfw_res_id = RESOURCE_RDMA_STATS_QUEUE_E;
+   break;
+   default:
+   break;
+   }
+
+   return mfw_res_id;
+}
+
+static u32 qed_hw_get_dflt_resc_num(struct qed_hwfn *p_hwfn,
+   enum qed_resources res_id)
 {
-   u8 enabled_func_idx = p_hwfn->enabled_func_idx;
-   u32 *resc_start = p_hwfn->hw_info.resc_start;
u8 num_funcs = p_hwfn->num_funcs_on_engine;
-   u32 *resc_num = p_hwfn->hw_info.resc_num;
struct qed_sb_cnt_info sb_cnt_info;
-   int i, max_vf_vlan_filters;
+   u32 dflt_resc_num = 0;
 
-   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   switch (res_id) {
+   case QED_SB:
+   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   qed_int_get_num_sbs(p_hwfn, _cnt_info);
+   dflt_resc_num = sb_cnt_info.sb_cnt;
+   break;
+   case QED_L2_QUEUE:
+   dflt_resc_num = MAX_NUM_L2_QUEUES_BB / num_funcs;
+   break;
+   case QED_VPORT:
+   dflt_resc_num = MAX_NUM_VPORTS_BB / num_funcs;
+   break;
+   case QED_RSS_ENG:
+   dflt_resc_num = ETH_RSS_ENGINE_NUM_BB / num_funcs;
+   break;
+   case QED_PQ:
+   /* The granularity of the PQs is 8 */
+   dflt_resc_num = MAX_QM_TX_QUEUES_BB / num_funcs;
+   dflt_resc_num &= ~0x7;
+   break;
+   case QED_RL:
+   dflt_resc_num =

[PATCH net-next v2 4/7] qede: Decouple ethtool caps from qed

2016-10-30 Thread Yuval Mintz

While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 327c614..fe7e7b8 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -320,7 +320,7 @@ struct qede_link_mode_mapping {
 {  \
int i;  \
\
-   for (i = 0; i < QED_LM_COUNT; i++) {\
+   for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) {  \
if ((caps) & (qed_lm_map[i].qed_link_mode)) \
__set_bit(qed_lm_map[i].ethtool_link_mode,\
  lk_ksettings->link_modes.name); \
@@ -331,7 +331,7 @@ struct qede_link_mode_mapping {
 {  \
int i;  \
\
-   for (i = 0; i < QED_LM_COUNT; i++) {\
+   for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) {  \
if (test_bit(qed_lm_map[i].ethtool_link_mode,   \
 lk_ksettings->link_modes.name))\
caps |= qed_lm_map[i].qed_link_mode;\
-- 
1.9.3

[PATCH net-next v2 1/7] qed*: Management firmware - notifications and defaults

2016-10-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.

This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c   |  52 +++-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |  59 -
 drivers/net/ethernet/qlogic/qed/qed_main.c  |  75 +++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 163 
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   | 102 +++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c|   8 ++
 include/linux/qed/qed_if.h  |  28 
 9 files changed, 487 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 653bb57..f20243c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -226,6 +226,7 @@ struct qed_hw_info {
u32 port_mode;
u32 hw_mode;
unsigned long   device_capabilities;
+   u16 mtu;
 };
 
 struct qed_hw_cid_data {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index edae5fc..33fd69e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1057,8 +1057,10 @@ int qed_hw_init(struct qed_dev *cdev,
bool allow_npar_tx_switch,
const u8 *bin_fw_data)
 {
-   u32 load_code, param;
-   int rc, mfw_rc, i;
+   u32 load_code, param, drv_mb_param;
+   bool b_default_mtu = true;
+   struct qed_hwfn *p_hwfn;
+   int rc = 0, mfw_rc, i;
 
if ((int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) {
DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n");
@@ -1074,6 +1076,12 @@ int qed_hw_init(struct qed_dev *cdev,
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
+   /* If management didn't provide a default, set one of our own */
+   if (!p_hwfn->hw_info.mtu) {
+   p_hwfn->hw_info.mtu = 1500;
+   b_default_mtu = false;
+   }
+
if (IS_VF(cdev)) {
p_hwfn->b_int_enabled = 1;
continue;
@@ -1157,6 +1165,38 @@ int qed_hw_init(struct qed_dev *cdev,
p_hwfn->hw_init_done = true;
}
 
+   if (IS_PF(cdev)) {
+   p_hwfn = QED_LEADING_HWFN(cdev);
+   drv_mb_param = (FW_MAJOR_VERSION << 24) |
+  (FW_MINOR_VERSION << 16) |
+  (FW_REVISION_VERSION << 8) |
+  (FW_ENGINEERING_VERSION);
+   rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt,
+DRV_MSG_CODE_OV_UPDATE_STORM_FW_VER,
+drv_mb_param, _code, );
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update firmware version\n");
+
+   if (!b_default_mtu) {
+   rc = qed_mcp_ov_update_mtu(p_hwfn, p_hwfn->p_main_ptt,
+  p_hwfn->hw_info.mtu);
+   if (rc)
+   DP_INFO(p_hwfn,
+   "Failed to update default mtu\n");
+   }
+
+   rc = qed_mcp_ov_update_driver_state(p_hwfn,
+   p_hwfn->p_main_ptt,
+ QED_OV_DRIVER_STATE_DISABLED);
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update driver state\n");
+
+   rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt,
+  QED_OV_ESWITCH_VEB);
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update eswitch mode\n");
+   }
+
return 0;
 }
 
@@ -1801,6 +1841,9 @@ static void qed_get_num_funcs(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
 
qed_get_num_funcs(p_hwfn, p_ptt);
 
+   if (qed_mcp_is_init(p_hwfn))
+   p_hwfn->hw_info.mtu = p_hwfn->mcp_info->func_info.mtu;
+
return

[PATCH net-next v2 5/7] qed: Learn of RDMA capabilities per-device

2016-10-30 Thread Yuval Mintz

Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].

Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  7 +++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 78 +++
 2 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index fdb7a09..1d113ce 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8601,6 +8601,7 @@ struct public_drv_mb {
 
 #define DRV_MSG_CODE_BIST_TEST 0x001e
 #define DRV_MSG_CODE_SET_LED_MODE  0x0020
+#define DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL  0x002b
 #define DRV_MSG_CODE_OS_WOL0x002e
 
 #define DRV_MSG_SEQ_NUMBER_MASK0x
@@ -8705,6 +8706,12 @@ struct public_drv_mb {
 
u32 fw_mb_param;
 
+   /* get pf rdma protocol command responce */
+#define FW_MB_PARAM_GET_PF_RDMA_NONE   0x0
+#define FW_MB_PARAM_GET_PF_RDMA_ROCE   0x1
+#define FW_MB_PARAM_GET_PF_RDMA_IWARP  0x2
+#define FW_MB_PARAM_GET_PF_RDMA_BOTH   0x3
+
u32 drv_pulse_mb;
 #define DRV_PULSE_SEQ_MASK 0x7fff
 #define DRV_PULSE_SYSTEM_TIME_MASK 0x
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c 
b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 768b35b..0927488 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1024,28 +1024,89 @@ int qed_mcp_get_media_type(struct qed_dev *cdev, u32 
*p_media_type)
return 0;
 }
 
+/* Old MFW has a global configuration for all PFs regarding RDMA support */
+static void
+qed_mcp_get_shmem_proto_legacy(struct qed_hwfn *p_hwfn,
+  enum qed_pci_personality *p_proto)
+{
+   /* There wasn't ever a legacy MFW that published iwarp.
+* So at this point, this is either plain l2 or RoCE.
+*/
+   if (test_bit(QED_DEV_CAP_ROCE, _hwfn->hw_info.device_capabilities))
+   *p_proto = QED_PCI_ETH_ROCE;
+   else
+   *p_proto = QED_PCI_ETH;
+
+   DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP,
+  "According to Legacy capabilities, L2 personality is %08x\n",
+  (u32) *p_proto);
+}
+
+static int
+qed_mcp_get_shmem_proto_mfw(struct qed_hwfn *p_hwfn,
+   struct qed_ptt *p_ptt,
+   enum qed_pci_personality *p_proto)
+{
+   u32 resp = 0, param = 0;
+   int rc;
+
+   rc = qed_mcp_cmd(p_hwfn, p_ptt,
+DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL, 0, , );
+   if (rc)
+   return rc;
+   if (resp != FW_MSG_CODE_OK) {
+   DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP,
+  "MFW lacks support for command; Returns %08x\n",
+  resp);
+   return -EINVAL;
+   }
+
+   switch (param) {
+   case FW_MB_PARAM_GET_PF_RDMA_NONE:
+   *p_proto = QED_PCI_ETH;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_ROCE:
+   *p_proto = QED_PCI_ETH_ROCE;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_BOTH:
+   DP_NOTICE(p_hwfn,
+ "Current day drivers don't support RoCE & iWARP. 
Default to RoCE-only\n");
+   *p_proto = QED_PCI_ETH_ROCE;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_IWARP:
+   default:
+   DP_NOTICE(p_hwfn,
+ "MFW answers GET_PF_RDMA_PROTOCOL but param is 
%08x\n",
+ param);
+   return -EINVAL;
+   }
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_IFUP,
+  "According to capabilities, L2 personality is %08x [resp 
%08x param %08x]\n",
+  (u32) *p_proto, resp, param);
+   return 0;
+}
+
 static int
 qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,
struct public_func *p_info,
+   struct qed_ptt *p_ptt,
enum qed_pci_personality *p_proto)
 {
int rc = 0;
 
switch (p_info->config & FUNC_MF_CFG_PROTOCOL_MASK) {
case FUNC_MF_CFG_PROTOCOL_ETHERNET:
-   if (test_bit(QED_DEV_CAP_ROCE,
-_hwfn->hw_info.device_capabilities))
-   *p_proto = QED_PCI_ETH_ROCE;
-   else
-   *p_proto = QED_PCI_ETH;
+   if (qed_mcp_get_shmem_proto_mfw(p_hwfn, p_ptt, p_proto))
+

[PATCH net-next v2 2/7] qed: Add nvram selftest

2016-10-30 Thread Yuval Mintz

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   4 +
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   1 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   |  94 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   |  41 ++
 drivers/net/ethernet/qlogic/qed/qed_selftest.c  | 101 
 drivers/net/ethernet/qlogic/qed/qed_selftest.h  |  10 +++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |   7 ++
 include/linux/qed/qed_if.h  |   9 +++
 8 files changed, 267 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 36de87a..f7dfa2e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8666,6 +8666,8 @@ struct public_drv_mb {
 
 #define DRV_MB_PARAM_BIST_REGISTER_TEST1
 #define DRV_MB_PARAM_BIST_CLOCK_TEST   2
+#define DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES  3
+#define DRV_MB_PARAM_BIST_NVM_TEST_IMAGE_BY_INDEX  4
 
 #define DRV_MB_PARAM_BIST_RC_UNKNOWN   0
 #define DRV_MB_PARAM_BIST_RC_PASSED1
@@ -8674,6 +8676,8 @@ struct public_drv_mb {
 
 #define DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT 0
 #define DRV_MB_PARAM_BIST_TEST_INDEX_MASK  0x00FF
+#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_SHIFT   8
+#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_MASK0xFF00
 
u32 fw_mb_header;
 #define FW_MSG_CODE_MASK   0x
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index d9fa52a..31f8e42 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -1508,6 +1508,7 @@ static int qed_update_mtu(struct qed_dev *cdev, u16 mtu)
.selftest_interrupt = _selftest_interrupt,
.selftest_register = _selftest_register,
.selftest_clock = _selftest_clock,
+   .selftest_nvram = _selftest_nvram,
 };
 
 const struct qed_common_ops qed_common_ops_pass = {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c 
b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 98dc913..8be6157 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1434,6 +1434,52 @@ int qed_mcp_mask_parities(struct qed_hwfn *p_hwfn,
return rc;
 }
 
+int qed_mcp_nvm_read(struct qed_dev *cdev, u32 addr, u8 *p_buf, u32 len)
+{
+   u32 bytes_left = len, offset = 0, bytes_to_copy, read_len = 0;
+   struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev);
+   u32 resp = 0, resp_param = 0;
+   struct qed_ptt *p_ptt;
+   int rc = 0;
+
+   p_ptt = qed_ptt_acquire(p_hwfn);
+   if (!p_ptt)
+   return -EBUSY;
+
+   while (bytes_left > 0) {
+   bytes_to_copy = min_t(u32, bytes_left, MCP_DRV_NVM_BUF_LEN);
+
+   rc = qed_mcp_nvm_rd_cmd(p_hwfn, p_ptt,
+   DRV_MSG_CODE_NVM_READ_NVRAM,
+   addr + offset +
+   (bytes_to_copy <<
+DRV_MB_PARAM_NVM_LEN_SHIFT),
+   , _param,
+   _len,
+   (u32 *)(p_buf + offset));
+
+   if (rc || (resp != FW_MSG_CODE_NVM_OK)) {
+   DP_NOTICE(cdev, "MCP command rc = %d\n", rc);
+   break;
+   }
+
+   /* This can be a lengthy process, and it's possible scheduler
+* isn't preemptable. Sleep a bit to prevent CPU hogging.
+*/
+   if (bytes_left % 0x1000 <
+   (bytes_left - read_len) % 0x1000)
+   usleep_range(1000, 2000);
+
+   offset += read_len;
+   bytes_left -= read_len;
+   }
+
+   cdev->mcp_nvm_resp = resp;
+   qed_ptt_release(p_hwfn, p_ptt);
+
+   return rc;
+}
+
 int qed_mcp_bist_register_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 {
u32 drv_mb_param = 0, rsp, param;
@@ -1475,3 +1521,51 @@ int qed_mcp_bist_clock_test(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
 
return rc;
 }
+
+int qed_mcp_bist_nvm_test_get_num_images(struct qed_hwfn *p_hwfn,
+struct qed_ptt *p_ptt,
+u32 *num_images)
+{
+   u32 drv_mb_param = 0, rsp;
+   int rc = 0;
+
+   drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES <<
+   DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT);
+
+   rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST,
+drv_mb_param, , num_images);
+   if (rc)
+   return rc;
+
+   if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK))
+   rc = -EINVAL;
+
+

[PATCH net-next v2 0/7] qed*: Patch series

2016-10-30 Thread Yuval Mintz

This series does several things. The bigger changes:

 - Add new notification APIs [& Defaults] for various fields.
The series then utilizes some of those qed <-> qede APIs to bass WoL
support upon.

 - Change the resource allocation scheme to receive the values from
management firmware, instead of equally sharing resources between
functions [that might not need those]. That would, e.g., allow us to
configure additional filters to network interfaces in presence of
storage [PCI] functions from same adapter.

Dave,

Please consider applying this series to `net-next'.

Thanks,
Yuval

Changes from previous version:
-
 - V2: Rebase on top of latest net-next.

Sudarsana Kalluru (1):
  qed*: Management firmware - notifications and defaults

Tomer Tayar (1):
  qed: Learn resources from management firmware

Yuval Mintz (5):
  qed: Add nvram selftest
  qed*: Add support for WoL
  qede: Decouple ethtool caps from qed
  qed: Learn of RDMA capabilities per-device
  qed: Use VF-queue feature

 drivers/net/ethernet/qlogic/qed/qed.h   |  19 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 382 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   | 120 ++-
 drivers/net/ethernet/qlogic/qed/qed_int.c   |  32 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c|   2 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 105 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 433 +++-
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   | 158 +
 drivers/net/ethernet/qlogic/qed/qed_selftest.c  | 101 ++
 drivers/net/ethernet/qlogic/qed/qed_selftest.h  |  10 +
 drivers/net/ethernet/qlogic/qed/qed_sriov.c |  17 +-
 drivers/net/ethernet/qlogic/qede/qede.h |   2 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  54 ++-
 drivers/net/ethernet/qlogic/qede/qede_main.c|  17 +
 include/linux/qed/qed_eth_if.h  |   2 +-
 include/linux/qed/qed_if.h  |  47 +++
 16 files changed, 1404 insertions(+), 97 deletions(-)

-- 
1.9.3

Re: [bnx2] [Regression 4.8] Driver loading fails without firmware

2016-10-30 Thread Baoquan He

Hi Paul,

On 10/30/16 at 12:05pm, Paul Menzel wrote:
> Dear Baoquan,
> 
> 
> Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He:
> > On 10/27/16 at 03:21pm, Paul Menzel wrote:
> 
> > > > > Baoquan, could you please fix this regression. My suggestion is, that 
> > > > > you
> > > > > add the old code back, but check if the firmware has been loaded. If 
> > > > > it
> > > > > hasn’t, load it again.
> > > > > 
> > > > > That way, people can update their Linux kernel, and it continues 
> > > > > working
> > > > > without changing the initramfs, or anything else.
> > > > 
> > > > I saw your mail but I am also not familiar with bnx2 driver. As the
> > > > commit log says I just tried to make bnx2 driver reset itself earlier.
> > > > 
> > > > So you did a git bisect and found this commit caused the regression,
> > > > right? If yes, and network developers have no action, I will look into
> > > > the code and see if I have idea to fix it.
> > > 
> > > Well, I looked through the commits and found that one, which would explain
> > > the changed behavior.
> > > 
> > > To be sure, and to follow your request, I took Linux 4.8.4 and reverted 
> > > your
> > > commit (attached). Then I deleted the firmware again from the initramfs, 
> > > and
> > > rebooted. The devices showed up just fine as before.
> > > 
> > > So to summarize, the commit is indeed the culprit.
> 
> > Sorry for this.
> > 
> > Could you tell the steps to reproduce? I will find a machine with bnx2
> > NIC and check if there's other ways.
> 
> Well, delete the bnx2 firmware files from the initramfs, and start the
> system.
> 
> Did you read my proposal, to try to load the firmware twice, that means,
> basically revert only the deleted lines of your commit, and add an
> additional check?

Thanks for your information!

I got a x86_64 system with bnx2 NIC, and clone Linus's git tree into
that system. Then building a new kernel 4.9.0-rc3+ with new initramfs.
But when I uncompressed the new initramfs, didn't find bnx2 related
firmware, no bnx2 files under lib/firmware of uncompressed initramfs
folder. While I did see them in /lib/firmware/bnx2/bnx2-x.fw. Could
you please say it more specifically how I should do to reproduce the
failure you encountered? I think your proposal looks good, just need a
test before post.

Thanks
Baoquan

Re: [PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread David Miller

From: Mark Lord 
Date: Sun, 30 Oct 2016 22:07:25 -0400

> On 16-10-30 08:57 PM, David Miller wrote:
>> From: Mark Lord 
>> Date: Sun, 30 Oct 2016 19:28:27 -0400
>> 
>>> The r8152 driver has been broken since (approx) 3.16.xx
>>> when support was added for hardware RX checksums
>>> on newer chip versions.  Symptoms include random
>>> segfaults and silent data corruption over NFS.
>>>
>>> The hardware checksum logig does not work on the VER_02
>>> dongles I have here when used with a slow embedded system CPU.
>>> Google reveals others reporting similar issues on Raspberry Pi.
>>>
>>> So, disable hardware RX checksum support for VER_02, and fix
>>> an obvious coding error for IPV6 checksums in the same function.
>>>
>>> Because this bug results in silent data corruption,
>>> it is a good candidate for back-porting to -stable >= 3.16.xx.
>>>
>>> Signed-off-by: Mark Lord 
>> 
>> Applied and queued up for -stable, thanks.
> 
> Thanks.  Now that this is taken care of, I do wonder if perhaps
> RX checksums ought to be enabled at all for ANY versions of this chip?

You should really start a dialogue with the developer who has been
making the most, if not all, of the major changes to this driver over
the past few years, Hayes Wang.

Re: [PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread Mark Lord

On 16-10-30 08:57 PM, David Miller wrote:
> From: Mark Lord 
> Date: Sun, 30 Oct 2016 19:28:27 -0400
> 
>> The r8152 driver has been broken since (approx) 3.16.xx
>> when support was added for hardware RX checksums
>> on newer chip versions.  Symptoms include random
>> segfaults and silent data corruption over NFS.
>>
>> The hardware checksum logig does not work on the VER_02
>> dongles I have here when used with a slow embedded system CPU.
>> Google reveals others reporting similar issues on Raspberry Pi.
>>
>> So, disable hardware RX checksum support for VER_02, and fix
>> an obvious coding error for IPV6 checksums in the same function.
>>
>> Because this bug results in silent data corruption,
>> it is a good candidate for back-porting to -stable >= 3.16.xx.
>>
>> Signed-off-by: Mark Lord 
> 
> Applied and queued up for -stable, thanks.

Thanks.  Now that this is taken care of, I do wonder if perhaps
RX checksums ought to be enabled at all for ANY versions of this chip?

My theory is that the checksums probably work okay most of the time,
except when the hardware RX buffer overflows.

In my case, and in the case of the Raspberry Pi, the receiving CPU
is quite a bit slower than mainstream x86, so it can quite easily
fall behind in emptying the RX buffer on the chip.
The only indication this has happened may be an incorrect RX checksum.

This is only a theory, but I otherwise have trouble explaining
why we are seeing invalid RX checksums -- direct cable connections
to a switch, shared only with the NFS server.  No reason for it
to have bad RX checksums in the first place.

Should we just blanket disable RX checksums for all versions here
unless proven otherwise/safe?

Anyone out there know better?

Cheers
Mark

Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation

2016-10-30 Thread Tom Herbert

On Sun, Oct 30, 2016 at 2:47 PM, Thomas Graf  wrote:
> On 10/30/16 at 01:34pm, Tom Herbert wrote:
>> On Sun, Oct 30, 2016 at 4:58 AM, Thomas Graf  wrote:
>> > +   if (unlikely(!dst->lwtstate->orig_output)) {
>> > +   WARN_ONCE(1, "orig_output not set on dst for prog %s\n",
>> > + bpf->out.name);
>> > +   kfree_skb(skb);
>> > +   return -EINVAL;
>> > +   }
>> > +
>> > +   return dst->lwtstate->orig_output(net, sk, skb);
>>
>> The BPF program may have changed the destination address so continuing
>> with original route in skb may not be appropriate here. This was fixed
>> in ila_lwt by calling ip6_route_output and we were able to dst cache
>> facility to cache the route to avoid cost of looking it up on every
>> packet. Since the kernel  has no insight into what the BPF program
>> does to the packet I'd suggest 1) checking if destination address
>> changed by BPF and if it did then call route_output to get new route
>> 2) If the LWT destination is a host route then try to keep a dst
>> cache. This would entail checking destination address on return that
>> it is the same one as kept in the dst cache.
>
> Instead of building complex logic, we can allow the program to return
> a code to indicate when to perform another route lookup just as we do
> for the redirect case. Just because the destination address has
> changed may not require another lookup in all cases. A typical example
> would be a program rewriting addresses for the default route to other
> address which are always handled by the default route as well. An
> unconditional lookup would hurt performance in many cases.

Right, that's why we rely on a dst cache. Any use of LWT that
encapsulates or tunnels to a fixed destination (ILA, VXLAN, IPIP,
etc.) would want to use the dst cache optimization to avoid the second
lookup. The ILA LWT code used to call orig output and that worked as
long as we could set the default router as the gateway "via". It was
something we were able to deploy, but not a general solution.
Integrating properly with routing gives a much better solution IMO.
Note that David Lebrun's latest LWT Segment Routing patch does the
second lookup with the dst cache to try to avoid it.

Thanks,
Tom

Re: [PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread David Miller

From: Mark Lord 
Date: Sun, 30 Oct 2016 19:28:27 -0400

> The r8152 driver has been broken since (approx) 3.16.xx
> when support was added for hardware RX checksums
> on newer chip versions.  Symptoms include random
> segfaults and silent data corruption over NFS.
> 
> The hardware checksum logig does not work on the VER_02
> dongles I have here when used with a slow embedded system CPU.
> Google reveals others reporting similar issues on Raspberry Pi.
> 
> So, disable hardware RX checksum support for VER_02, and fix
> an obvious coding error for IPV6 checksums in the same function.
> 
> Because this bug results in silent data corruption,
> it is a good candidate for back-porting to -stable >= 3.16.xx.
> 
> Signed-off-by: Mark Lord 

Applied and queued up for -stable, thanks.

Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums

2016-10-30 Thread Paul Bolle

On Sun, 2016-10-30 at 17:22 -0400, David Miller wrote:
>         3) "Fix broken RX checksums."  Commit header lines and commit
>    messages are proper English, therefore sentences should
>    begin with a capitalized letter and end with a period.

Commit messages should be proper English. But commit header lines
should not end with a period. The vast majority doesn't. Yes, I've just
checked.

How many newspaper headlines end with a period?

Thanks,

Paul Bolle

[PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread Mark Lord

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware RX checksum support for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord 

--- old/drivers/net/usb/r8152.c 2016-09-30 04:20:43.0 -0400
+++ linux/drivers/net/usb/r8152.c   2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
u8 checksum = CHECKSUM_NONE;
u32 opts2, opts3;

-   if (tp->version == RTL_VER_01)
+   if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
goto return_result;

opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
checksum = CHECKSUM_NONE;
else
checksum = CHECKSUM_UNNECESSARY;
-   } else if (RD_IPV6_CS) {
+   } else if (opts2 & RD_IPV6_CS) {
if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
checksum = CHECKSUM_UNNECESSARY;
else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))

Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme

2016-10-30 Thread Saeed Mahameed

On Mon, Oct 31, 2016 at 1:11 AM, Saeed Mahameed
 wrote:
> On Mon, Oct 31, 2016 at 12:44 AM, Alexei Starovoitov
>  wrote:
>> On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote:
>>>
>>> Note that the XDP TX rings are no longer shown in ethtool -S.
>>
>> ouch. Can you make it to show them as some large TX numbers instead?
>> It would really sux to lose stats on them.
>>
>
> Right, Tariq, how did we miss this ?
>
> FYI, I don't think we need the whole TX queue stats for XDP tx rings,
> it is just an overkill, there are only two active counters for XDP TX
> ring (XDP_TX_FWD/XDP_TX_DROP).
>
> XDP_TX_FWD or currently "tx_packets" will count successfully forwarded packets
> XDP_TX_DROP or currently "tx_dropped" will count TX dropped packets
> due to full ring.
>
> do we need tx_bytes as well ? I think yes.
>
> The whole idea of this refactoring i.e. differentiating between TXQ
> netdev rings and XDP TX rings, that XDP is a fast path with minimal
> system overhead, we don't need to have the full set of regular TXQ
> counters for XDP TX rings.

BTW in mlx5 we have the required xdp stats as a part of the rx ring,

static const struct counter_desc rq_stats_desc[] = {
[...]
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_drop) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx_full) },
[...]

We should do the same here.

Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme

2016-10-30 Thread Saeed Mahameed

On Mon, Oct 31, 2016 at 12:44 AM, Alexei Starovoitov
 wrote:
> On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote:
>>
>> Note that the XDP TX rings are no longer shown in ethtool -S.
>
> ouch. Can you make it to show them as some large TX numbers instead?
> It would really sux to lose stats on them.
>

Right, Tariq, how did we miss this ?

FYI, I don't think we need the whole TX queue stats for XDP tx rings,
it is just an overkill, there are only two active counters for XDP TX
ring (XDP_TX_FWD/XDP_TX_DROP).

XDP_TX_FWD or currently "tx_packets" will count successfully forwarded packets
XDP_TX_DROP or currently "tx_dropped" will count TX dropped packets
due to full ring.

do we need tx_bytes as well ? I think yes.

The whole idea of this refactoring i.e. differentiating between TXQ
netdev rings and XDP TX rings, that XDP is a fast path with minimal
system overhead, we don't need to have the full set of regular TXQ
counters for XDP TX rings.

Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-10-30 Thread Jes Sorensen

John Heenan  writes:
> Thanks for your reply.
>
> The code was tested on a Cube i9 which has an internal rtl8723bu.
>
> No other devices were tested.
>
> I am happy to accept in an ideal context hard coding macpower is
> undesirable, the comment is undesirable and it is wrong to assume the
> issue is not unique to the rtl8723bu.
>
> Your reply is idealistic. What can I do now?  I should of course have
> factored out other untested devices in my patches. The apparent
> concern you have with process over outcome is a useful lesson.
>
> We are not in an ideal situation. The comment is of course relevant
> and useful to starting a process to fixing a real bug I do not have
> sufficient information to refine any further for and others do. In the
> circumstances nothing really more can be expected.

Well you should start by reporting the issue and either providing a
patch that only affects 8723bu, or work on a generic solution. I
appreciate patches, but I do not appreciate patches that will make
something work for one person and break for everyone else - I spent a
lot of time making sure the driver works across the different devices.

The comment violates all Linux standards - first rule when modifying
code is to respect the style of the code you are dealing with.

Code is 80 characters wide, and comments are /* */ never the ugly C++
crap.

> My patch cover letter, [PATCH 0/2] provides evidence of a mess with
> regard to determining macpower for the rtl8723bu and what is
> subsequently required. This is important.
>
> The kernel driver code is very poorly documented and there is not a
> single source reference to device documentation. For example macpower
> is noting more than a setting that is true or false according to
> whether a read of a particular register return 0xef or not. Such value
> was never obtained so a full init sequence was never performed.

The kernel driver is documented with the information I have - there is
NO device documentation because Realtek refuses to provide any. I have
written the driver based on what I have retried by reading the vendor
drivers. If you can provide better documentation, I certainly would love
to get it.

> It would be helpful if you could provide a link to device references.
> As it is, how am I supposed to revise the patch without relevant
> information?

Look at the USB device table, it shows you which devices are supported.

> My patch code works with the Cube i9, as is, despite a lack of
> adequate information. Before it did not. That is a powerful statement

The driver works with a lot of different devices in itself that is a
powerful statement!

Yes I want to see it work with as many devices as possible, but just
moving things around without balancing it and not explaining why is not
a fix. If we move more of the init sequence to _start() you also have to
move matching pieces to _stop().

Jes

Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

2016-10-30 Thread Michael S. Tsirkin

On Fri, Oct 28, 2016 at 01:11:01PM -0400, David Miller wrote:
> From: John Fastabend 
> Date: Fri, 28 Oct 2016 08:56:35 -0700
> 
> > On 16-10-27 07:10 PM, David Miller wrote:
> >> From: Alexander Duyck 
> >> Date: Thu, 27 Oct 2016 18:43:59 -0700
> >> 
> >>> On Thu, Oct 27, 2016 at 6:35 PM, David Miller  wrote:
>  From: "Michael S. Tsirkin" 
>  Date: Fri, 28 Oct 2016 01:25:48 +0300
> 
> > On Thu, Oct 27, 2016 at 05:42:18PM -0400, David Miller wrote:
> >> From: "Michael S. Tsirkin" 
> >> Date: Fri, 28 Oct 2016 00:30:35 +0300
> >>
> >>> Something I'd like to understand is how does XDP address the
> >>> problem that 100Byte packets are consuming 4K of memory now.
> >>
> >> Via page pools.  We're going to make a generic one, but right now
> >> each and every driver implements a quick list of pages to allocate
> >> from (and thus avoid the DMA man/unmap overhead, etc.)
> >
> > So to clarify, ATM virtio doesn't attempt to avoid dma map/unmap
> > so there should be no issue with that even when using sub/page
> > regions, assuming DMA APIs support sub-page map/unmap correctly.
> 
>  That's not what I said.
> 
>  The page pools are meant to address the performance degradation from
>  going to having one packet per page for the sake of XDP's
>  requirements.
> 
>  You still need to have one packet per page for correct XDP operation
>  whether you do page pools or not, and whether you have DMA mapping
>  (or it's equivalent virutalization operation) or not.
> >>>
> >>> Maybe I am missing something here, but why do you need to limit things
> >>> to one packet per page for correct XDP operation?  Most of the drivers
> >>> out there now are usually storing something closer to at least 2
> >>> packets per page, and with the DMA API fixes I am working on there
> >>> should be no issue with changing the contents inside those pages since
> >>> we won't invalidate or overwrite the data after the DMA buffer has
> >>> been synchronized for use by the CPU.
> >> 
> >> Because with SKB's you can share the page with other packets.
> >> 
> >> With XDP you simply cannot.
> >> 
> >> It's software semantics that are the issue.  SKB frag list pages
> >> are read only, XDP packets are writable.
> >> 
> >> This has nothing to do with "writability" of the pages wrt. DMA
> >> mapping or cpu mappings.
> >> 
> > 
> > Sorry I'm not seeing it either. The current xdp_buff is defined
> > by,
> > 
> >   struct xdp_buff {
> > void *data;
> > void *data_end;
> >   };
> > 
> > The verifier has an xdp_is_valid_access() check to ensure we don't go
> > past data_end. The page for now at least never leaves the driver. For
> > the work to get xmit to other devices working I'm still not sure I see
> > any issue.
> 
> I guess I can say that the packets must be "writable" until I'm blue
> in the face but I'll say it again, semantically writable pages are a
> requirement.  And if multiple packets share a page this requirement
> is not satisfied.
> 
> Also, we want to do several things in the future:
> 
> 1) Allow push/pop of headers via eBPF code, which needs we need
>headroom.

I think that with e.g. LRO or a large MTU page per packet
does not guarantee headroom.

> 2) Transparently zero-copy pass packets into userspace, basically
>the user will have a semi-permanently mapped ring of all the
>packet pages sitting in the RX queue of the device and the
>page pool associated with it.  This way we avoid all of the
>TLB flush/map overhead for the user's mapping of the packets
>just as we avoid the DMA map/unmap overhead.

Looks like you can share pages between packets as long as
they all come from the same pool so accessible
to the same userspace.

> And that's just the beginninng.
> 
> I'm sure others can come up with more reasons why we have this
> requirement.

I'm still a bit confused :( Is this a requirement of the current code or
to enable future extensions?

-- 
MST

Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme

2016-10-30 Thread Alexei Starovoitov

On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote:
> 
> Note that the XDP TX rings are no longer shown in ethtool -S.

ouch. Can you make it to show them as some large TX numbers instead?
It would really sux to lose stats on them.

Re: Let's do P4

2016-10-30 Thread Alexei Starovoitov

On Sun, Oct 30, 2016 at 05:38:36PM +0100, Jiri Pirko wrote:
> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:
> >On 10/30/16 at 08:44am, Jiri Pirko wrote:
> >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:
> >> >On 16-10-29 07:49 AM, Jakub Kicinski wrote:
> >> >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote:
> >> >>> Hi all.
> >> >>>

sorry for delay. travelling to KS, so probably missed something in
this thread and comments can be totally off...

the subject "let's do P4" is imo misleading, since it reads like
we don't do P4 at the moment, whereas the opposite is true.
Several p4->bpf compilers is a proof.

> The network world is divided into 2 general types of hw:
> 1) network ASICs - network specific silicon, containing things like TCAM
>These ASICs are suitable to be programmed by P4.

i think the opposite is the case in case of P4.
when hw asic has tcam it's still far far away from being usable with P4
which requires fully programmable protocol parser, arbitrary tables and so on.
P4 doesn't even define TCAM as a table type. The p4 program can declare
a desired algorithm of search in the table and compiler has to figure out
what HW resources to use to satisfy such p4 program.

> 2) network processors - basically a general purpose CPUs
>These processors are suitable to be programmed by eBPF.

I think this statement is also misleading, since it positions
p4 and bpf as competitors whereas that's not the case.
p4 is the language. bpf is an instruction set.

> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
> 
>  |
>  |   +--> ebpf engine
>  |   |
>  |   |
>  |   compilerB
>  |   ^
>  |   |
> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW
>  |
>userspace | kernel
>  |

frankly this diagram smells very much like kernel bypass to me,
since I cannot see how one can put the whole p4 language compiler
into the driver, so this last step of p4ast->hw, I presume, will be
done by firmware, which will be running full compiler in an embedded cpu
on the switch. To me that's precisely the kernel bypass, since we won't
have a clue what HW capabilities actually are and won't be able to fine
grain control them.
Please correct me if I'm wrong.

> Plus the thing I cannot imagine in the model you propose is table fillup.
> For ebpf, you use maps. For p4 you would have to have a separate HW-only
> API. This is very similar to the original John's Flow-API. And therefore
> a kernel bypass.

I think John's flow api is a better way to expose mellanox switch capabilities.
I also think it's not fair to call it 'bypass'. I see nothing in it
that justify such 'swear word' ;)
The goal of flow api was to expose HW features to user space, so that
user space can program it. For something simple as mellanox switch
asic it fits perfectly well.
Unless I misunderstand the bigger goal of this discussion and it's
about programming ezchip devices.

If the goal is to model hw tcam in the linux kernel then just introduce
tcam bpf map type. It will be dog slow in user space, but it will
match exactly what is happnening in the HW and user space can make
sensible trade-offs.

Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation

2016-10-30 Thread Thomas Graf

On 10/30/16 at 01:34pm, Tom Herbert wrote:
> On Sun, Oct 30, 2016 at 4:58 AM, Thomas Graf  wrote:
> > +   if (unlikely(!dst->lwtstate->orig_output)) {
> > +   WARN_ONCE(1, "orig_output not set on dst for prog %s\n",
> > + bpf->out.name);
> > +   kfree_skb(skb);
> > +   return -EINVAL;
> > +   }
> > +
> > +   return dst->lwtstate->orig_output(net, sk, skb);
> 
> The BPF program may have changed the destination address so continuing
> with original route in skb may not be appropriate here. This was fixed
> in ila_lwt by calling ip6_route_output and we were able to dst cache
> facility to cache the route to avoid cost of looking it up on every
> packet. Since the kernel  has no insight into what the BPF program
> does to the packet I'd suggest 1) checking if destination address
> changed by BPF and if it did then call route_output to get new route
> 2) If the LWT destination is a host route then try to keep a dst
> cache. This would entail checking destination address on return that
> it is the same one as kept in the dst cache.

Instead of building complex logic, we can allow the program to return
a code to indicate when to perform another route lookup just as we do
for the redirect case. Just because the destination address has
changed may not require another lookup in all cases. A typical example
would be a program rewriting addresses for the default route to other
address which are always handled by the default route as well. An
unconditional lookup would hurt performance in many cases.

Re: [PATCH for-next V2 00/15][PULL request] Mellanox mlx5 core driver updates 2016-10-25

2016-10-30 Thread David Miller

From: Saeed Mahameed 
Date: Sun, 30 Oct 2016 23:21:53 +0200

> This series contains some updates and fixes of mlx5 core and
> IB drivers with the addition of two features that demand
> new low level commands and infrastructure updates.
>  - SRIOV VF max rate limit support
>  - mlx5e tc support for FWD rules with counter.
> 
> Needed for both net and rdma subsystems.

Pulled, thanks.

[PATCH for-next V2 13/15] net/mlx5: Add multi dest support

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.

This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.

>From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.

>From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c  |  14 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  38 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  49 ++--
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  32 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  68 ++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  22 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  42 +--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 289 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   5 +
 include/linux/mlx5/fs.h|  28 +-
 14 files changed, 374 insertions(+), 254 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d02341e..8e0dbd5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1771,13 +1771,13 @@ static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
mutex_lock(>flow_db.lock);
 
list_for_each_entry_safe(iter, tmp, >list, list) {
-   mlx5_del_flow_rule(iter->rule);
+   mlx5_del_flow_rules(iter->rule);
put_flow_table(dev, iter->prio, true);
list_del(>list);
kfree(iter);
}
 
-   mlx5_del_flow_rule(handler->rule);
+   mlx5_del_flow_rules(handler->rule);
put_flow_table(dev, handler->prio, true);
mutex_unlock(>flow_db.lock);
 
@@ -1907,10 +1907,10 @@ static struct mlx5_ib_flow_handler 
*create_flow_rule(struct mlx5_ib_dev *dev,
spec->match_criteria_enable = 
get_match_criteria_enable(spec->match_criteria);
action = dst ? MLX5_FLOW_CONTEXT_ACTION_FWD_DEST :
MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO;
-   handler->rule = mlx5_add_flow_rule(ft, spec,
+   handler->rule = mlx5_add_flow_rules(ft, spec,
   action,
   MLX5_FS_DEFAULT_FLOW_TAG,
-  dst);
+  dst, 1);
 
if (IS_ERR(handler->rule)) {
err = PTR_ERR(handler->rule);
@@ -1941,7 +1941,7 @@ static struct mlx5_ib_flow_handler 
*create_dont_trap_rule(struct mlx5_ib_dev *de
handler_dst = create_flow_rule(dev, ft_prio,
   flow_attr, dst);
if (IS_ERR(handler_dst)) {
-   mlx5_del_flow_rule(handler->rule);
+   mlx5_del_flow_rules(handler->rule);
ft_prio->refcount--;
kfree(handler);
handler = handler_dst;
@@ -2004,7 +2004,7 @@ static struct mlx5_ib_flow_handler 
*create_leftovers_rule(struct mlx5_ib_dev *de
 
_specs[LEFTOVERS_UC].flow_attr,
 dst);
if (IS_ERR(handler_ucast)) {
-   mlx5_del_flow_rule(handler->rule);
+   mlx5_del_flow_rules(handler->rule);
ft_prio->refcount--;
kfree(handler);
handler = handler_ucast;
@@ -2046,7 +2046,7 @@ static struct mlx5_ib_flow_handler 
*create_sniffer_rule(struct mlx5_ib_dev *dev,
return handler_rx;
 
 err_tx:
-   mlx5_del_flow_rule(handler_rx->rule);
+   mlx5_del_flow_rules(handler_rx->rule);
ft_rx->refcount--;
kfree(handler_rx);
 err:
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dcdcd19..d5d0077 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -153,7 +153,7 @@ struct mlx5_ib_flow_handler {
struct list_headlist;
struct ib_flow  ibflow;
struct

[PATCH for-next V2 06/15] net/mlx5: Introduce TSAR manipulation firmware commands

2016-10-30 Thread Saeed Mahameed

From: Mohamad Haj Yahia 

TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.

In this patch we implement the API of manipulating the TSAR.

Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  13 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   7 +
 drivers/net/ethernet/mellanox/mlx5/core/rl.c   |  65 +++
 include/linux/mlx5/mlx5_ifc.h  | 199 -
 4 files changed, 279 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 1e639f8..8561102 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -318,6 +318,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT:
case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER:
+   case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT:
return MLX5_CMD_STAT_OK;
 
case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -419,11 +421,14 @@ static int mlx5_internal_err_ret_value(struct 
mlx5_core_dev *dev, u16 op,
case MLX5_CMD_OP_QUERY_FLOW_TABLE:
case MLX5_CMD_OP_CREATE_FLOW_GROUP:
case MLX5_CMD_OP_QUERY_FLOW_GROUP:
-
case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY:
case MLX5_CMD_OP_ALLOC_FLOW_COUNTER:
case MLX5_CMD_OP_QUERY_FLOW_COUNTER:
case MLX5_CMD_OP_ALLOC_ENCAP_HEADER:
+   case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT:
+   case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -580,6 +585,12 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE);
MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER);
MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER);
+   MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT);
+   MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT);
+   MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT);
default: return "unknown command opcode";
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 3d0cfb9..bf43171 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -91,6 +91,13 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int 
num_vfs);
 bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev);
 int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
+int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+  void *context, u32 *element_id);
+int mlx5_modify_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+  void *context, u32 element_id,
+  u32 modify_bitmask);
+int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 
hierarchy,
+   u32 element_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev);
 u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
index 104902a..e651e4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -36,6 +36,71 @@
 #include 
 #include "mlx5_core.h"
 
+/* Scheduling element fw management */
+int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
+  void *ctx, u32

[PATCH for-next V2 10/15] net/mlx5: Use fte status to decide on firmware command

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

An fte status becomes FS_FTE_STATUS_EXISTING only after it was
created in HW. We can use this in order to simplify the logic on
what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING
we need to create the fte, otherwise we need only to update it.

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index a07ff30..e2bab9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -946,7 +946,7 @@ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte 
*fte,
BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_DESTINATION_LIST);
}
 
-   if (fte->dests_size == 1 || !dest)
+   if (!(fte->status & FS_FTE_STATUS_EXISTING))
err = mlx5_cmd_create_fte(get_dev(>node),
  ft, fg->id, fte);
else
-- 
2.7.4

[PATCH for-next V2 12/15] net/mlx5: Group similer rules under the same fte

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

When adding a new rule, if we can match it with compare_match_value and
flow tag we might be able to insert the rule to the same fte.
In order to do that, there must be an overlap between the actions of the
fte and the new rule.

When updating the action of an existing fte, we must tell the firmware
we are doing so.

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fca6937..43d7052 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -920,7 +920,8 @@ static struct mlx5_flow_rule *alloc_rule(struct 
mlx5_flow_destination *dest)
 /* fte should not be deleted while calling this function */
 static struct mlx5_flow_rule *add_rule_fte(struct fs_fte *fte,
   struct mlx5_flow_group *fg,
-  struct mlx5_flow_destination *dest)
+  struct mlx5_flow_destination *dest,
+  bool update_action)
 {
struct mlx5_flow_table *ft;
struct mlx5_flow_rule *rule;
@@ -931,6 +932,9 @@ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte 
*fte,
if (!rule)
return ERR_PTR(-ENOMEM);
 
+   if (update_action)
+   modify_mask |= BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION);
+
fs_get_obj(ft, fg->node.parent);
/* Add dest to dests list- we need flow tables to be in the
 * end of the list for forward to next prio rules.
@@ -1109,7 +1113,9 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
fs_for_each_fte(fte, fg) {
nested_lock_ref_node(>node, FS_MUTEX_CHILD);
if (compare_match_value(>mask, match_value, >val) &&
-   action == fte->action && flow_tag == fte->flow_tag) {
+   (action & fte->action) && flow_tag == fte->flow_tag) {
+   int old_action = fte->action;
+
rule = find_flow_rule(fte, dest);
if (rule) {
atomic_inc(>node.refcount);
@@ -1117,11 +1123,15 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
unlock_ref_node(>node);
return rule;
}
-   rule = add_rule_fte(fte, fg, dest);
-   if (IS_ERR(rule))
+   fte->action |= action;
+   rule = add_rule_fte(fte, fg, dest,
+   old_action != action);
+   if (IS_ERR(rule)) {
+   fte->action = old_action;
goto unlock_fte;
-   else
+   } else {
goto add_rule;
+   }
}
unlock_ref_node(>node);
}
@@ -1138,7 +1148,7 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
}
tree_init_node(>node, 0, del_fte);
nested_lock_ref_node(>node, FS_MUTEX_CHILD);
-   rule = add_rule_fte(fte, fg, dest);
+   rule = add_rule_fte(fte, fg, dest, false);
if (IS_ERR(rule)) {
kfree(fte);
goto unlock_fg;
-- 
2.7.4

[PATCH for-next V2 14/15] net/mlx5: Add option to add fwd rule with counter

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

Currently the code supports only drop rules to possess counters,
add that ability also for fwd rules.

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 24 +--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 6732287..0dfd998 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -374,6 +374,7 @@ static void del_rule(struct fs_node *node)
struct mlx5_core_dev *dev = get_dev(node);
int match_len = MLX5_ST_SZ_BYTES(fte_match_param);
int err;
+   bool update_fte = false;
 
match_value = mlx5_vzalloc(match_len);
if (!match_value) {
@@ -392,13 +393,23 @@ static void del_rule(struct fs_node *node)
list_del(>next_ft);
mutex_unlock(>dest_attr.ft->lock);
}
+
+   if (rule->dest_attr.type == MLX5_FLOW_DESTINATION_TYPE_COUNTER  &&
+   --fte->dests_size) {
+   modify_mask = BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION);
+   fte->action &= ~MLX5_FLOW_CONTEXT_ACTION_COUNT;
+   update_fte = true;
+   goto out;
+   }
+
if ((fte->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) &&
--fte->dests_size) {
modify_mask = 
BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_DESTINATION_LIST),
-   err = mlx5_cmd_update_fte(dev, ft,
- fg->id,
- modify_mask,
- fte);
+   update_fte = true;
+   }
+out:
+   if (update_fte && fte->dests_size) {
+   err = mlx5_cmd_update_fte(dev, ft, fg->id, modify_mask, fte);
if (err)
mlx5_core_warn(dev,
   "%s can't del rule fg id=%d 
fte_index=%d\n",
@@ -1287,8 +1298,9 @@ static bool counter_is_valid(struct mlx5_fc *counter, u32 
action)
if (!counter)
return false;
 
-   /* Hardware support counter for a drop action only */
-   return action == (MLX5_FLOW_CONTEXT_ACTION_DROP | 
MLX5_FLOW_CONTEXT_ACTION_COUNT);
+   return (action & (MLX5_FLOW_CONTEXT_ACTION_DROP |
+ MLX5_FLOW_CONTEXT_ACTION_FWD_DEST)) &&
+   (action & MLX5_FLOW_CONTEXT_ACTION_COUNT);
 }
 
 static bool dest_is_valid(struct mlx5_flow_destination *dest,
-- 
2.7.4

[PATCH for-next V2 01/15] IB/mlx5: Skip handling unknown events

2016-10-30 Thread Saeed Mahameed

Do not dispatch unknown mlx5 core events on mlx5_ib_event.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 2217477..d02341e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2358,6 +2358,8 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void 
*context,
ibev.event = IB_EVENT_CLIENT_REREGISTER;
port = (u8)param;
break;
+   default:
+   return;
}
 
ibev.device   = >ib_dev;
-- 
2.7.4

[PATCH for-next V2 00/15][PULL request] Mellanox mlx5 core driver updates 2016-10-25

2016-10-30 Thread Saeed Mahameed

Hi Dave and Doug,

This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
 - SRIOV VF max rate limit support
 - mlx5e tc support for FWD rules with counter.

Needed for both net and rdma subsystems.

Updates and Fixes:
>From Saeed Mahameed (2):
  - mlx5 IB: Skip handling unknown mlx5 events
  - Add ConnectX-5 PCIe 4.0 VF device ID

>From Artemy Kovalyov (2):
  - Update struct mlx5_ifc_xrqc_bits
  - Ensure SRQ physical address structure endianness

>From Eugenia Emantayev (1):
  - Fix length of async_event_mask

New Features:
>From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
  - Introduce TSAR manipulation firmware commands
  - Introduce E-switch QoS management
  - Add SRIOV VF max rate configuration support

>From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
  - Don't unlock fte while still using it
  - Use fte status to decide on firmware command
  - Refactor find_flow_rule
  - Group similar rules under the same fte
  - Add multi dest support
  - Add option to add fwd rule with counter
  - mlx5e tc support for FWD rule with counter
  Mark here fixed two trivial issues with the flow steering core, and did
  some refactoring in the flow steering API to support adding mulit destination
  rules to the same hardware flow table entry at once.  In the last two patches
  added the ability to populate a flow rule with a flow counter to the same 
flow entry.

V2: Dropped some patches that added new structures without adding any usage of 
them.
Added SRIOV VF max rate configuration support patch that introduces
the usage of the TSAR infrastructure.
Added flow steering fixes and refactoring in addition to mlx5 tc
support for forward rule with counter.

The following changes since commit a909d3e636995ba7c349e2ca5dbb528154d4ac30
Linux 4.9-rc3

are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git 
tags/shared-for-4.10-1

for you to fetch changes up to e37a79e5d4cac3831fac3d4afbf2461f56b4b7bd
net/mlx5e: Add tc support for FWD rule with counter

Thanks,
Saeed & Leon.

Artemy Kovalyov (2):
  net/mlx5: Update struct mlx5_ifc_xrqc_bits
  net/mlx5: Ensure SRQ physical address structure endianness

Eugenia Emantayev (1):
  net/mlx5: Fix length of async_event_mask

Mark Bloch (7):
  net/mlx5: Don't unlock fte while still using it
  net/mlx5: Use fte status to decide on firmware command
  net/mlx5: Refactor find_flow_rule
  net/mlx5: Group similer rules under the same fte
  net/mlx5: Add multi dest support
  net/mlx5: Add option to add fwd rule with counter
  net/mlx5e: Add tc support for FWD rule with counter

Mohamad Haj Yahia (3):
  net/mlx5: Introduce TSAR manipulation firmware commands
  net/mlx5: Introduce E-switch QoS management
  net/mlx5: Add SRIOV VF max rate configuration support

Saeed Mahameed (2):
  IB/mlx5: Skip handling unknown events
  net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID

 drivers/infiniband/hw/mlx5/main.c  |  16 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  13 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  38 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  49 +--
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c|  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  15 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  35 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 244 --
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  36 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  60 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 358 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   1 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   7 +
 drivers/net/ethernet/mellanox/mlx5/core/rl.c   |  65 
 include/linux/mlx5/fs.h|  28 +-
 include/linux/mlx5/mlx5_ifc.h  | 201 +++-
 include/linux/mlx5/srq.h   |   2 +-
 22 files changed, 927 insertions(+), 289 deletions(-)

-- 
2.7.4

Re: [PATCH net-next 0/7] qed*: Patch series

2016-10-30 Thread David Miller

From: Yuval Mintz 
Date: Sun, 30 Oct 2016 18:38:30 +0200

> Please consider applying this series to `net-next'.

Even the first patch doesn't apply cleanly, please respin.

Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums

2016-10-30 Thread David Miller

From: Mark Lord 
Date: Wed, 26 Oct 2016 18:36:57 -0400

> Patch attached (to deal with buggy mailer) and also below for review.

Please make your mailer work properly so that you can submit
patches properly which work inline, just like every other developer
does for the kernel.

Also please format your Subject line properly, it must be of the
form:

[PATCH net] r8152: Fix broken RX checksums.

The important parts are:

1) "[PATCH net]"  This says that it is a patch, and that it
   is targetting the 'net' GIT tree specifically.

2) "r8152: "  This indicates the "subsystem" that the patch
   specifically targets, in this case the r8152 driver.  It
   must end with a colon character then a space.

3) "Fix broken RX checksums."  Commit header lines and commit
   messages are proper English, therefore sentences should
   begin with a capitalized letter and end with a period.

Thanks.

[PATCH for-next V2 11/15] net/mlx5: Refactor find_flow_rule

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

The way we compare between two dests will need to be used in other
places in the future, so we factor out the comparison logic
between two dests into a separate function.

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 29 ---
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e2bab9d..fca6937 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -153,6 +153,8 @@ static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
 static void del_fte(struct fs_node *node);
+static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
+   struct mlx5_flow_destination *d2);
 
 static void tree_init_node(struct fs_node *node,
   unsigned int refcount,
@@ -1064,21 +1066,30 @@ static struct mlx5_flow_group *create_autogroup(struct 
mlx5_flow_table *ft,
return fg;
 }
 
+static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1,
+   struct mlx5_flow_destination *d2)
+{
+   if (d1->type == d2->type) {
+   if ((d1->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
+d1->vport_num == d2->vport_num) ||
+   (d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
+d1->ft == d2->ft) ||
+   (d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
+d1->tir_num == d2->tir_num))
+   return true;
+   }
+
+   return false;
+}
+
 static struct mlx5_flow_rule *find_flow_rule(struct fs_fte *fte,
 struct mlx5_flow_destination *dest)
 {
struct mlx5_flow_rule *rule;
 
list_for_each_entry(rule, >node.children, node.list) {
-   if (rule->dest_attr.type == dest->type) {
-   if ((dest->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
-dest->vport_num == rule->dest_attr.vport_num) ||
-   (dest->type == 
MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
-dest->ft == rule->dest_attr.ft) ||
-   (dest->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
-dest->tir_num == rule->dest_attr.tir_num))
-   return rule;
-   }
+   if (mlx5_flow_dests_cmp(>dest_attr, dest))
+   return rule;
}
return NULL;
 }
-- 
2.7.4

[PATCH for-next V2 08/15] net/mlx5: Add SRIOV VF max rate configuration support

2016-10-30 Thread Saeed Mahameed

From: Mohamad Haj Yahia 

Implement the vf set rate ndo by modifying the TSAR vport rate limit.

Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 63 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  2 +
 3 files changed, 80 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7eaf380..7f763d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2945,6 +2945,20 @@ static int mlx5e_set_vf_trust(struct net_device *dev, 
int vf, bool setting)
 
return mlx5_eswitch_set_vport_trust(mdev->priv.eswitch, vf + 1, 
setting);
 }
+
+static int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate,
+int max_tx_rate)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   if (min_tx_rate)
+   return -EOPNOTSUPP;
+
+   return mlx5_eswitch_set_vport_rate(mdev->priv.eswitch, vf + 1,
+  max_tx_rate);
+}
+
 static int mlx5_vport_link2ifla(u8 esw_link)
 {
switch (esw_link) {
@@ -3252,6 +3266,7 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov 
= {
.ndo_set_vf_vlan = mlx5e_set_vf_vlan,
.ndo_set_vf_spoofchk = mlx5e_set_vf_spoofchk,
.ndo_set_vf_trust= mlx5e_set_vf_trust,
+   .ndo_set_vf_rate = mlx5e_set_vf_rate,
.ndo_get_vf_config   = mlx5e_get_vf_config,
.ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
.ndo_get_vf_stats= mlx5e_get_vf_stats,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2e11a94..9ef01d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1451,6 +1451,47 @@ static void esw_vport_disable_qos(struct mlx5_eswitch 
*esw, int vport_num)
vport->qos.enabled = false;
 }
 
+static int esw_vport_qos_config(struct mlx5_eswitch *esw, int vport_num,
+   u32 max_rate)
+{
+   u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+   struct mlx5_vport *vport = >vports[vport_num];
+   struct mlx5_core_dev *dev = esw->dev;
+   void *vport_elem;
+   u32 bitmask = 0;
+   int err = 0;
+
+   if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
+   return -EOPNOTSUPP;
+
+   if (!vport->qos.enabled)
+   return -EIO;
+
+   MLX5_SET(scheduling_context, _ctx, element_type,
+SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
+   vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx,
+ element_attributes);
+   MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
+   MLX5_SET(scheduling_context, _ctx, parent_element_id,
+esw->qos.root_tsar_id);
+   MLX5_SET(scheduling_context, _ctx, max_average_bw,
+max_rate);
+   bitmask |= MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_MAX_AVERAGE_BW;
+
+   err = mlx5_modify_scheduling_element_cmd(dev,
+SCHEDULING_HIERARCHY_E_SWITCH,
+_ctx,
+vport->qos.esw_tsar_ix,
+bitmask);
+   if (err) {
+   esw_warn(esw->dev, "E-Switch modify TSAR vport element failed 
(vport=%d,err=%d)\n",
+vport_num, err);
+   return err;
+   }
+
+   return 0;
+}
+
 static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN])
 {
((u8 *)node_guid)[7] = mac[0];
@@ -1888,6 +1929,7 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch 
*esw,
ivi->qos = evport->info.qos;
ivi->spoofchk = evport->info.spoofchk;
ivi->trusted = evport->info.trusted;
+   ivi->max_tx_rate = evport->info.max_rate;
mutex_unlock(>state_lock);
 
return 0;
@@ -1981,6 +2023,27 @@ int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch 
*esw,
return 0;
 }
 
+int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw,
+   int vport, u32 max_rate)
+{
+   struct mlx5_vport *evport;
+   int err = 0;
+
+   if (!ESW_ALLOWED(esw))
+   return -EPERM;
+   if (!LEGAL_VPORT(esw, vport))
+   return -EINVAL;
+
+   mutex_lock(>state_lock);
+   evport = >vports[vport];
+   err = esw_vport_qos_config(esw, vport, max_rate);
+   if (!err)
+

[PATCH for-next V2 04/15] net/mlx5: Fix length of async_event_mask

2016-10-30 Thread Saeed Mahameed

From: Eugenia Emantayev 

According to PRM async_event_mask have to be 64 bits long.

Signed-off-by: Eugenia Emantayev 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index aaca090..e74a73b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -469,7 +469,7 @@ void mlx5_eq_cleanup(struct mlx5_core_dev *dev)
 int mlx5_start_eqs(struct mlx5_core_dev *dev)
 {
struct mlx5_eq_table *table = >priv.eq_table;
-   u32 async_event_mask = MLX5_ASYNC_EVENT_MASK;
+   u64 async_event_mask = MLX5_ASYNC_EVENT_MASK;
int err;
 
if (MLX5_CAP_GEN(dev, pg))
-- 
2.7.4

[PATCH for-next V2 02/15] net/mlx5: Update struct mlx5_ifc_xrqc_bits

2016-10-30 Thread Saeed Mahameed

From: Artemy Kovalyov 

Update struct mlx5_ifc_xrqc_bits according to last specification

Signed-off-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6045d4d..12f72e4 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2844,7 +2844,7 @@ struct mlx5_ifc_xrqc_bits {
 
struct mlx5_ifc_tag_matching_topology_context_bits 
tag_matching_topology_context;
 
-   u8 reserved_at_180[0x200];
+   u8 reserved_at_180[0x880];
 
struct mlx5_ifc_wq_bits wq;
 };
-- 
2.7.4

Re: [patch net-next 00/16] mlxsw: Add Infiniband support for Mellanox switches

2016-10-30 Thread Jiri Pirko

Sun, Oct 30, 2016 at 09:51:07PM CET, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Fri, 28 Oct 2016 21:17:34 +0200
>
>> This patchset adds basic Infiniband support for SwitchX-2, Switch-IB
>> and Switch-IB-2 ASIC drivers.
>
>This depended upon the bug fixes which were only in 'net' until a few
>hours ago.
>
>Please state this explicitly in the future, it'll save me time.

Apologies. Will be more careful with this next time. Thanks.


>
>Series applied, thanks.

Re: [PATCH] net: bonding: use new api ethtool_{get|set}_link_ksettings

2016-10-30 Thread David Miller

From: Philippe Reynes 
Date: Tue, 25 Oct 2016 18:41:31 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied, thanks.

[PATCH for-next V2 05/15] net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID

2016-10-30 Thread Saeed Mahameed

For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the
device ID "0x101a" to mlx5_core_pci_table.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index d9c3c70..197e04c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1422,6 +1422,7 @@ static const struct pci_device_id mlx5_core_pci_table[] = 
{
{ PCI_VDEVICE(MELLANOX, 0x1017) },  /* ConnectX-5, 
PCIe 3.0 */
{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5 
VF */
{ PCI_VDEVICE(MELLANOX, 0x1019) },  /* ConnectX-5, 
PCIe 4.0 */
+   { PCI_VDEVICE(MELLANOX, 0x101a), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5, 
PCIe 4.0 VF */
{ 0, }
 };
 
-- 
2.7.4

[PATCH for-next V2 09/15] net/mlx5: Don't unlock fte while still using it

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

When adding a new rule to an fte, we need to hold the fte lock
until we add that rule to the fte and increase the fte ref count.

Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API")
Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5da2cc8..a07ff30 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1107,9 +1107,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
return rule;
}
rule = add_rule_fte(fte, fg, dest);
-   unlock_ref_node(>node);
if (IS_ERR(rule))
-   goto unlock_fg;
+   goto unlock_fte;
else
goto add_rule;
}
@@ -1127,6 +1126,7 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
goto unlock_fg;
}
tree_init_node(>node, 0, del_fte);
+   nested_lock_ref_node(>node, FS_MUTEX_CHILD);
rule = add_rule_fte(fte, fg, dest);
if (IS_ERR(rule)) {
kfree(fte);
@@ -1139,6 +1139,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct 
mlx5_flow_group *fg,
list_add(>node.list, prev);
 add_rule:
tree_add_node(>node, >node);
+unlock_fte:
+   unlock_ref_node(>node);
 unlock_fg:
unlock_ref_node(>node);
return rule;
-- 
2.7.4

[PATCH for-next V2 03/15] net/mlx5: Ensure SRQ physical address structure endianness

2016-10-30 Thread Saeed Mahameed

From: Artemy Kovalyov 

SRQ physical address structure field should be in big-endian format.

Signed-off-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/srq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/srq.h b/include/linux/mlx5/srq.h
index 33c97dc..1cde0fd 100644
--- a/include/linux/mlx5/srq.h
+++ b/include/linux/mlx5/srq.h
@@ -55,7 +55,7 @@ struct mlx5_srq_attr {
u32 lwm;
u32 user_index;
u64 db_record;
-   u64 *pas;
+   __be64 *pas;
 };
 
 struct mlx5_core_dev;
-- 
2.7.4

[PATCH for-next V2 15/15] net/mlx5e: Add tc support for FWD rule with counter

2016-10-30 Thread Saeed Mahameed

From: Mark Bloch 

When creating a FWD rule using tc create also a HW counter
for this rule.

Signed-off-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c  |  3 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c   | 20 +++-
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5d9ac0d..c2e4728 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -419,7 +419,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
return -EINVAL;
}
 
-   attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+   MLX5_FLOW_CONTEXT_ACTION_COUNT;
out_priv = netdev_priv(out_dev);
attr->out_rep = out_priv->ppriv;
continue;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 8b2a383..53d9d6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -48,11 +48,12 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
struct mlx5_flow_spec *spec,
struct mlx5_esw_flow_attr *attr)
 {
-   struct mlx5_flow_destination dest = { 0 };
+   struct mlx5_flow_destination dest[2] = {};
struct mlx5_fc *counter = NULL;
struct mlx5_flow_handle *rule;
void *misc;
int action;
+   int i = 0;
 
if (esw->mode != SRIOV_OFFLOADS)
return ERR_PTR(-EOPNOTSUPP);
@@ -60,15 +61,17 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
action = attr->action;
 
if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
-   dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
-   dest.vport_num = attr->out_rep->vport;
-   action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
-   } else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+   dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+   dest[i].vport_num = attr->out_rep->vport;
+   i++;
+   }
+   if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
counter = mlx5_fc_create(esw->dev, true);
if (IS_ERR(counter))
return ERR_CAST(counter);
-   dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-   dest.counter = counter;
+   dest[i].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+   dest[i].counter = counter;
+   i++;
}
 
misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, 
misc_parameters);
@@ -81,8 +84,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
  MLX5_MATCH_MISC_PARAMETERS;
 
rule = mlx5_add_flow_rules((struct mlx5_flow_table *)esw->fdb_table.fdb,
-  spec, action, 0, , 1);
-
+  spec, action, 0, dest, i);
if (IS_ERR(rule))
mlx5_fc_destroy(esw->dev, counter);
 
-- 
2.7.4

[PATCH for-next V2 07/15] net/mlx5: Introduce E-switch QoS management

2016-10-30 Thread Saeed Mahameed

From: Mohamad Haj Yahia 

Add TSAR to the eswitch which will act as the vports rate limiter.
Create/Destroy TSAR on Enable/Dsiable SRIOV.
Attach/Detach vport to eswitch TSAR on Enable/Disable vport.

Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 113 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  12 +++
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index abbf2c3..2e11a94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1351,6 +1351,106 @@ static int esw_vport_egress_config(struct mlx5_eswitch 
*esw,
return err;
 }
 
+/* Vport QoS management */
+static int esw_create_tsar(struct mlx5_eswitch *esw)
+{
+   u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+   struct mlx5_core_dev *dev = esw->dev;
+   int err;
+
+   if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
+   return 0;
+
+   if (esw->qos.enabled)
+   return -EEXIST;
+
+   err = mlx5_create_scheduling_element_cmd(dev,
+SCHEDULING_HIERARCHY_E_SWITCH,
+_ctx,
+>qos.root_tsar_id);
+   if (err) {
+   esw_warn(esw->dev, "E-Switch create TSAR failed (%d)\n", err);
+   return err;
+   }
+
+   esw->qos.enabled = true;
+   return 0;
+}
+
+static void esw_destroy_tsar(struct mlx5_eswitch *esw)
+{
+   int err;
+
+   if (!esw->qos.enabled)
+   return;
+
+   err = mlx5_destroy_scheduling_element_cmd(esw->dev,
+ SCHEDULING_HIERARCHY_E_SWITCH,
+ esw->qos.root_tsar_id);
+   if (err)
+   esw_warn(esw->dev, "E-Switch destroy TSAR failed (%d)\n", err);
+
+   esw->qos.enabled = false;
+}
+
+static int esw_vport_enable_qos(struct mlx5_eswitch *esw, int vport_num,
+   u32 initial_max_rate)
+{
+   u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
+   struct mlx5_vport *vport = >vports[vport_num];
+   struct mlx5_core_dev *dev = esw->dev;
+   void *vport_elem;
+   int err = 0;
+
+   if (!esw->qos.enabled || !MLX5_CAP_GEN(dev, qos) ||
+   !MLX5_CAP_QOS(dev, esw_scheduling))
+   return 0;
+
+   if (vport->qos.enabled)
+   return -EEXIST;
+
+   MLX5_SET(scheduling_context, _ctx, element_type,
+SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
+   vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx,
+ element_attributes);
+   MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
+   MLX5_SET(scheduling_context, _ctx, parent_element_id,
+esw->qos.root_tsar_id);
+   MLX5_SET(scheduling_context, _ctx, max_average_bw,
+initial_max_rate);
+
+   err = mlx5_create_scheduling_element_cmd(dev,
+SCHEDULING_HIERARCHY_E_SWITCH,
+_ctx,
+>qos.esw_tsar_ix);
+   if (err) {
+   esw_warn(esw->dev, "E-Switch create TSAR vport element failed 
(vport=%d,err=%d)\n",
+vport_num, err);
+   return err;
+   }
+
+   vport->qos.enabled = true;
+   return 0;
+}
+
+static void esw_vport_disable_qos(struct mlx5_eswitch *esw, int vport_num)
+{
+   struct mlx5_vport *vport = >vports[vport_num];
+   int err = 0;
+
+   if (!vport->qos.enabled)
+   return;
+
+   err = mlx5_destroy_scheduling_element_cmd(esw->dev,
+ SCHEDULING_HIERARCHY_E_SWITCH,
+ vport->qos.esw_tsar_ix);
+   if (err)
+   esw_warn(esw->dev, "E-Switch destroy TSAR vport element failed 
(vport=%d,err=%d)\n",
+vport_num, err);
+
+   vport->qos.enabled = false;
+}
+
 static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN])
 {
((u8 *)node_guid)[7] = mac[0];
@@ -1386,6 +1486,7 @@ static void esw_apply_vport_conf(struct mlx5_eswitch *esw,
esw_vport_egress_config(esw, vport);
}
 }
+
 static void esw_enable_vport(struct mlx5_eswitch *esw, int vport_num,
 int enable_events)
 {
@@ -1399,6 +1500,10 @@ static void esw_enable_vport(struct mlx5_eswitch *esw, 
int vport_num,
/* Restore old vport configuration */

Re: Let's do P4

2016-10-30 Thread John Fastabend

On 16-10-30 12:56 PM, Jiri Pirko wrote:
> Sun, Oct 30, 2016 at 07:44:43PM CET, kubak...@wp.pl wrote:
>> On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote:
>>> Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote:
 On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:  
> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:  
>>>  [...]  
>>>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]
>>>  [...]  
>
> Agreed.  

 Just to clarify my intention here was not to suggest the use of eBPF as
 the IR.  I was merely cautioning against bundling the new API with P4,
 for multiple reasons.  As John mentioned P4 spec was evolving in the
 past.  The spec is designed for HW more capable than the switch ASICs we
 have today.  As vendors move to provide more configurability we may need
 to extend the API beyond P4.  We may want to extend this API to for SW
 hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
 John showed examples of matchd software which already uses P4 at the
 frontend today and translates it to different targets (eBPF, u32, HW).
 It may just be about the naming but I feel like calling the new API
 more generically, switch AST or some such may help to avoid unnecessary
 ties and confusion.  
>>>
>>> Well, that basically means to create "something" that could be be used
>>> to translate p4 source to. Not sure how exactly this "something" should
>>> look like and how different would it be from p4. I thought it might
>>> be good to benefit from the p4 definition and use it directly. Not sure.
>>
>> We have to translate the P4 into "something" already, that something
>> is the AST we will load into the kernel.  Or were you planning to use
>> some official P4 AST?  I'm not suggesting we add our own high level
> 
> I'm not aware of existence of some official P4 AST. We have to figure it
> out.
> 

The compilers at p4.org have an AST so you could claim those are in some
sense "official". Also given the BNF published in the p4 spec lends
itself to what the AST should look like.

Also FWIW the AST is not necessarily the same as the IR.

> 
>> language.  I agree that P4 is a good starting point, and perhaps a good
>> high level language.  I'm just cautious of creating an equivalency
>> between high level language (P4) and the kernel ABI.
> 
> Understood. Definitelly good to be very cautious when defining a kernel
> API.
> 
> 

And another point that came up (trying to unify threads a bit)

"I wonder why p4 does not handle the HW capabilities. At least I did
not find it. It would be certainly nice to have it."

One of the points of P4 is that the hardware should be configurable. So
given a P4 definition of a parse graph, table layout, etc. the hardware
should configure itself to support that "program". The reason you don't
see any HW capabilities is because the "program" is exactly what the
hardware is expected to run. Also the P4 spec does not provide a
definition or a "runtime" API. This will at some point be defined in
another spec.

So a clarifying point are you expecting hardware to reconfigure itself
to match the P4 program or are you simply using this to configure TCAM
slices and building a runtime API.

For example if a P4 program gives a new parse graph that is not
supported by the hardware should it be rejected. From the flow-api
you will see a handful of get_* operations but no set_* operations.
Because the set_* path has to come down to the hardware in
ucode/low-level firmware updates. Its unlikely that vendors will
want to expose ucode/etc.

The set_flow/get_flow bits could be mapped onto a cls_p4 or a
cls_switch as I think was hinted above.

Thanks,
John

Re: Let's do P4

2016-10-30 Thread John Fastabend

[...]

> 
> Yeah, I was also thinking about something similar to your Flow-API,
> but we need something more generic I believe.

I've heard this in a couple other forums as well but please elaborate
exactly what needs to be more generic? That API is sufficient to both
express the init time piece of the original P4 draft and the runtime
component.

I guess we are trying to strike a balance here between the ability
to actually write an IR that a sufficiently large subset of hardware
can support "easily" and something that can support all possible
hardware features.

IMO this leads to something like the Flow-API in the first case or
to something like eBPF for all possible features.

> 
> 
>>
>> We also have an emulated path also auto-generated from compiler tools
>> that creates eBPF code from the IR so this would give you the software
>> fall-back.
> 
> 
> Btw, Flow-API was rejected because it was a clean kernel-bypass. In case
> of p4, if we do what Thomas is suggesting, having x.bpf for SW and
> x.p4ast for HW, that would be the very same kernel-bypass. Therefore I
> strongly believe there should be a single kernel API for p4 SW+HW - for
> both p4 program insertion and runtime configuration.

Another area of push-back came from creating yet another infrastructure.

Re: [patch net-next 00/16] mlxsw: Add Infiniband support for Mellanox switches

2016-10-30 Thread David Miller

From: Jiri Pirko 
Date: Fri, 28 Oct 2016 21:17:34 +0200

> This patchset adds basic Infiniband support for SwitchX-2, Switch-IB
> and Switch-IB-2 ASIC drivers.

This depended upon the bug fixes which were only in 'net' until a few
hours ago.

Please state this explicitly in the future, it'll save me time.

Series applied, thanks.

Re: [RFC PATCH 01/13] pinctrl: meson: Add GXL pinctrl definitions

2016-10-30 Thread Rob Herring

On Fri, Oct 21, 2016 at 04:40:26PM +0200, Neil Armstrong wrote:
> Add support for the Amlogic Meson GXL SoC, this is a partially complete
> definition only based on the Amlogic Vendor tree.
> 
> This definition differs a lot from the GXBB and needs a separate entry.
> 
> Signed-off-by: Neil Armstrong 
> ---
>  .../devicetree/bindings/pinctrl/meson,pinctrl.txt  |   2 +

Acked-by: Rob Herring 

>  drivers/pinctrl/meson/Makefile |   3 +-
>  drivers/pinctrl/meson/pinctrl-meson-gxl.c  | 589 
> +
>  drivers/pinctrl/meson/pinctrl-meson.c  |   8 +
>  drivers/pinctrl/meson/pinctrl-meson.h  |   2 +
>  include/dt-bindings/gpio/meson-gxl-gpio.h  | 131 +
>  6 files changed, 734 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/pinctrl/meson/pinctrl-meson-gxl.c
>  create mode 100644 include/dt-bindings/gpio/meson-gxl-gpio.h

Re: [PATCH] net: stmmac: Add OXNAS Glue Driver

2016-10-30 Thread Rob Herring

On Fri, Oct 21, 2016 at 10:44:45AM +0200, Neil Armstrong wrote:
> Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820.
> 
> Signed-off-by: Neil Armstrong 
> ---
>  .../devicetree/bindings/net/oxnas-dwmac.txt|  44 +

It's preferred that bindings are a separate patch.

>  drivers/net/ethernet/stmicro/stmmac/Kconfig|  11 ++
>  drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
>  drivers/net/ethernet/stmicro/stmmac/dwmac-oxnas.c  | 219 
> +
>  4 files changed, 275 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/oxnas-dwmac.txt
>  create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-oxnas.c
> 
> Changes since RFC at https://patchwork.kernel.org/patch/9387257 :
>  - Drop init/exit callbacks
>  - Implement proper remove and PM callback
>  - Call init from probe
>  - Disable/Unprepare clock if stmmac probe fails
> 
> diff --git a/Documentation/devicetree/bindings/net/oxnas-dwmac.txt 
> b/Documentation/devicetree/bindings/net/oxnas-dwmac.txt
> new file mode 100644
> index 000..5d2696c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/oxnas-dwmac.txt
> @@ -0,0 +1,44 @@
> +* Oxford Semiconductor OXNAS DWMAC Ethernet controller
> +
> +The device inherits all the properties of the dwmac/stmmac devices
> +described in the file stmmac.txt in the current directory with the
> +following changes.
> +
> +Required properties on all platforms:
> +
> +- compatible:Depending on the platform this should be one of:
> + - "oxsemi,ox820-dwmac"
> + Additionally "snps,dwmac" and any applicable more
> + detailed version number described in net/stmmac.txt
> + should be used.

You should be explicit what version applies to ox820. "snps,dwmac" 
should probably be deprecated IMO. There are so many variations of DW 
h/w.

> +
> +- reg:   The first register range should be the one of the DWMAC
> + controller.

This is worded like there's a 2nd range?

> +
> +- clocks: Should contain phandles to the following clocks
> +- clock-names:   Should contain the following:
> + - "stmmaceth" - see stmmac.txt
> + - "gmac" - peripheral gate clock
> +
> +- oxsemi,sys-ctrl: a phandle to the system controller syscon node
> +
> +Example :
> +
> +etha: ethernet@4040 {
> + compatible = "oxsemi,ox820-dwmac", "snps,dwmac";
> + reg = <0x4040 0x2000>;
> + interrupts = ,
> +  ;
> + interrupt-names = "macirq", "eth_wake_irq";
> + mac-address = []; /* Filled in by U-Boot */
> + phy-mode = "rgmii";
> +
> + clocks = < CLK_820_ETHA>, <>;
> + clock-names = "gmac", "stmmaceth";
> + resets = < RESET_MAC>;
> +
> + /* Regmap for sys registers */
> + oxsemi,sys-ctrl = <>;
> +
> + status = "disabled";
> +};

Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation

2016-10-30 Thread Tom Herbert

On Sun, Oct 30, 2016 at 4:58 AM, Thomas Graf  wrote:
> Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and
> BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a
> LWT redirection of type LWTUNNEL_ENCAP_BPF.
>
> The separate program types are required because manipulation of
> packet data is only allowed on the output and transmit path as
> the subsequent dst_input() call path assumes an IP header
> validated by ip_rcv(). The BPF programs will be handed an skb
> with the L3 header attached and may return one of the following
> return codes:
>
>  BPF_OK - Continue routing as per nexthop
>  BPF_DROP - Drop skb and return EPERM
>  BPF_REDIRECT - Redirect skb to device as per redirect() helper.
> (Only valid on lwtunnel_xmit() hook)
>
> The return codes are binary compatible with their TC_ACT_
> relatives to ease compatibility.
>
> A new helper bpf_skb_push() is added which allows to preprend an
> L2 header in front of the skb, extend the existing L3 header, or
> both. This allows to address a wide range of issues:
>  - Optimize L2 header construction when L2 information is always
>static to avoid ARP/NDisc lookup.
>  - Extend IP header to add additional IP options.
>  - Perform simple encapsulation where offload is of no concern.
>(The existing funtionality to attach a tunnel key to the skb
> and redirect to a tunnel net_device to allow for offload
> continues to work obviously).
>
> Signed-off-by: Thomas Graf 
> ---
>  include/linux/filter.h|   2 +-
>  include/uapi/linux/bpf.h  |  31 +++-
>  include/uapi/linux/lwtunnel.h |  21 +++
>  kernel/bpf/verifier.c |  16 +-
>  net/core/Makefile |   2 +-
>  net/core/filter.c | 148 -
>  net/core/lwt_bpf.c| 365 
> ++
>  net/core/lwtunnel.c   |   1 +
>  8 files changed, 579 insertions(+), 7 deletions(-)
>  create mode 100644 net/core/lwt_bpf.c
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 1f09c52..aad7f81 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -438,7 +438,7 @@ struct xdp_buff {
>  };
>
>  /* compute the linear packet data range [data, data_end) which
> - * will be accessed by cls_bpf and act_bpf programs
> + * will be accessed by cls_bpf, act_bpf and lwt programs
>   */
>  static inline void bpf_compute_data_end(struct sk_buff *skb)
>  {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e2f38e0..2ebaa3c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -96,6 +96,9 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_TRACEPOINT,
> BPF_PROG_TYPE_XDP,
> BPF_PROG_TYPE_PERF_EVENT,
> +   BPF_PROG_TYPE_LWT_IN,
> +   BPF_PROG_TYPE_LWT_OUT,
> +   BPF_PROG_TYPE_LWT_XMIT,
>  };
>
>  #define BPF_PSEUDO_MAP_FD  1
> @@ -383,6 +386,16 @@ union bpf_attr {
>   *
>   * int bpf_get_numa_node_id()
>   * Return: Id of current NUMA node.
> + *
> + * int bpf_skb_push()
> + * Add room to beginning of skb and adjusts MAC header offset 
> accordingly.
> + * Extends/reallocaes for needed skb headeroom automatically.
> + * May change skb data pointer and will thus invalidate any check done
> + * for direct packet access.
> + * @skb: pointer to skb
> + * @len: length of header to be pushed in front
> + * @flags: Flags (unused for now)
> + * Return: 0 on success or negative error
>   */
>  #define __BPF_FUNC_MAPPER(FN)  \
> FN(unspec), \
> @@ -427,7 +440,8 @@ union bpf_attr {
> FN(skb_pull_data),  \
> FN(csum_update),\
> FN(set_hash_invalid),   \
> -   FN(get_numa_node_id),
> +   FN(get_numa_node_id),   \
> +   FN(skb_push),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -511,6 +525,21 @@ struct bpf_tunnel_key {
> __u32 tunnel_label;
>  };
>
> +/* Generic BPF return codes which all BPF program types may support.
> + * The values are binary compatible with their TC_ACT_* counter-part to
> + * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
> + * programs.
> + *
> + * XDP is handled seprately, see XDP_*.
> + */
> +enum bpf_ret_code {
> +   BPF_OK = 0,
> +   /* 1 reserved */
> +   BPF_DROP = 2,
> +   /* 3-6 reserved */
> +   BPF_REDIRECT = 7,
> +};
> +
>  /* User return codes for XDP prog type.
>   * A valid XDP program must return one of these defined values. All other
>   * return codes are reserved for future use. Unknown return codes will result
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index a478fe8..9354d997 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
>

[net-next PATCH 5/7] stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling

2016-10-30 Thread Joachim Eastwood

Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index e814b68..aa75a27 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -237,8 +237,6 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
u32 reg = dwmac->ctrl_reg;
u32 val;
 
-   clk_prepare_enable(dwmac->clk);
-
if (dwmac->gmac_en)
regmap_update_bits(regmap, reg, EN_MASK, EN);
 
@@ -348,11 +346,23 @@ static int sti_dwmac_probe(struct platform_device *pdev)
plat_dat->bsp_priv = dwmac;
plat_dat->fix_mac_speed = data->fix_retime_src;
 
-   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   ret = clk_prepare_enable(dwmac->clk);
if (ret)
return ret;
 
-   return stmmac_dvr_probe(>dev, plat_dat, _res);
+   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   if (ret)
+   goto disable_clk;
+
+   ret = stmmac_dvr_probe(>dev, plat_dat, _res);
+   if (ret)
+   goto disable_clk;
+
+   return 0;
+
+disable_clk:
+   clk_disable_unprepare(dwmac->clk);
+   return ret;
 }
 
 static int sti_dwmac_remove(struct platform_device *pdev)
@@ -381,6 +391,7 @@ static int sti_dwmac_resume(struct device *dev)
struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
struct platform_device *pdev = to_platform_device(dev);
 
+   clk_prepare_enable(dwmac->clk);
sti_dwmac_init(pdev, dwmac);
 
return stmmac_resume(dev);
-- 
2.10.1

[net-next PATCH 7/7] stmmac: dwmac-sti: remove unused priv dev member

2016-10-30 Thread Joachim Eastwood

The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 0af3faa..f51fb16 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -126,7 +126,6 @@ struct sti_dwmac {
struct clk *clk;/* PHY clock */
u32 ctrl_reg;   /* GMAC glue-logic control register */
int clk_sel_reg;/* GMAC ext clk selection register */
-   struct device *dev;
struct regmap *regmap;
bool gmac_en;
u32 speed;
@@ -274,7 +273,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
return err;
}
 
-   dwmac->dev = dev;
dwmac->interface = of_get_phy_mode(np);
dwmac->regmap = regmap;
dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en");
-- 
2.10.1

[net-next PATCH 4/7] stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data

2016-10-30 Thread Joachim Eastwood

The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 09dd2be..e814b68 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -128,6 +128,7 @@ struct sti_dwmac {
int clk_sel_reg;/* GMAC ext clk selection register */
struct device *dev;
struct regmap *regmap;
+   bool gmac_en;
u32 speed;
void (*fix_retime_src)(void *priv, unsigned int speed);
 };
@@ -233,14 +234,12 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
struct sti_dwmac *dwmac = priv;
struct regmap *regmap = dwmac->regmap;
int iface = dwmac->interface;
-   struct device *dev = dwmac->dev;
-   struct device_node *np = dev->of_node;
u32 reg = dwmac->ctrl_reg;
u32 val;
 
clk_prepare_enable(dwmac->clk);
 
-   if (of_property_read_bool(np, "st,gmac_en"))
+   if (dwmac->gmac_en)
regmap_update_bits(regmap, reg, EN_MASK, EN);
 
regmap_update_bits(regmap, reg, MII_PHY_SEL_MASK, phy_intf_sels[iface]);
@@ -281,6 +280,7 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
dwmac->dev = dev;
dwmac->interface = of_get_phy_mode(np);
dwmac->regmap = regmap;
+   dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en");
dwmac->ext_phyclk = of_property_read_bool(np, "st,ext-phyclk");
dwmac->tx_retime_src = TX_RETIME_SRC_NA;
dwmac->speed = SPEED_100;
-- 
2.10.1

[net-next PATCH 6/7] stmmac: dwmac-sti: clean up and rename sti_dwmac_init

2016-10-30 Thread Joachim Eastwood

Rename sti_dwmac_init to sti_dwmac_set_phy_mode which is a better
description for what it really does.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index aa75a27..0af3faa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -229,9 +229,8 @@ static void stid127_fix_retime_src(void *priv, u32 spd)
regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val);
 }
 
-static int sti_dwmac_init(struct platform_device *pdev, void *priv)
+static int sti_dwmac_set_phy_mode(struct sti_dwmac *dwmac)
 {
-   struct sti_dwmac *dwmac = priv;
struct regmap *regmap = dwmac->regmap;
int iface = dwmac->interface;
u32 reg = dwmac->ctrl_reg;
@@ -245,7 +244,7 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
val = (iface == PHY_INTERFACE_MODE_REVMII) ? 0 : ENMII;
regmap_update_bits(regmap, reg, ENMII_MASK, val);
 
-   dwmac->fix_retime_src(priv, dwmac->speed);
+   dwmac->fix_retime_src(dwmac, dwmac->speed);
 
return 0;
 }
@@ -350,7 +349,7 @@ static int sti_dwmac_probe(struct platform_device *pdev)
if (ret)
return ret;
 
-   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   ret = sti_dwmac_set_phy_mode(dwmac);
if (ret)
goto disable_clk;
 
@@ -389,10 +388,9 @@ static int sti_dwmac_suspend(struct device *dev)
 static int sti_dwmac_resume(struct device *dev)
 {
struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
-   struct platform_device *pdev = to_platform_device(dev);
 
clk_prepare_enable(dwmac->clk);
-   sti_dwmac_init(pdev, dwmac);
+   sti_dwmac_set_phy_mode(dwmac);
 
return stmmac_resume(dev);
 }
-- 
2.10.1

[net-next PATCH 3/7] stmmac: dwmac-sti: add PM ops and resume function

2016-10-30 Thread Joachim Eastwood

Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.

Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 46 +++--
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index f009bf4..09dd2be 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -253,12 +253,6 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
return 0;
 }
 
-static void sti_dwmac_exit(struct platform_device *pdev, void *priv)
-{
-   struct sti_dwmac *dwmac = priv;
-
-   clk_disable_unprepare(dwmac->clk);
-}
 static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct platform_device *pdev)
 {
@@ -352,8 +346,6 @@ static int sti_dwmac_probe(struct platform_device *pdev)
dwmac->fix_retime_src = data->fix_retime_src;
 
plat_dat->bsp_priv = dwmac;
-   plat_dat->init = sti_dwmac_init;
-   plat_dat->exit = sti_dwmac_exit;
plat_dat->fix_mac_speed = data->fix_retime_src;
 
ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
@@ -363,6 +355,40 @@ static int sti_dwmac_probe(struct platform_device *pdev)
return stmmac_dvr_probe(>dev, plat_dat, _res);
 }
 
+static int sti_dwmac_remove(struct platform_device *pdev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(>dev);
+   int ret = stmmac_dvr_remove(>dev);
+
+   clk_disable_unprepare(dwmac->clk);
+
+   return ret;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int sti_dwmac_suspend(struct device *dev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
+   int ret = stmmac_suspend(dev);
+
+   clk_disable_unprepare(dwmac->clk);
+
+   return ret;
+}
+
+static int sti_dwmac_resume(struct device *dev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
+   struct platform_device *pdev = to_platform_device(dev);
+
+   sti_dwmac_init(pdev, dwmac);
+
+   return stmmac_resume(dev);
+}
+#endif /* CONFIG_PM_SLEEP */
+
+SIMPLE_DEV_PM_OPS(sti_dwmac_pm_ops, sti_dwmac_suspend, sti_dwmac_resume);
+
 static const struct sti_dwmac_of_data stih4xx_dwmac_data = {
.fix_retime_src = stih4xx_fix_retime_src,
 };
@@ -382,10 +408,10 @@ MODULE_DEVICE_TABLE(of, sti_dwmac_match);
 
 static struct platform_driver sti_dwmac_driver = {
.probe  = sti_dwmac_probe,
-   .remove = stmmac_pltfr_remove,
+   .remove = sti_dwmac_remove,
.driver = {
.name   = "sti-dwmac",
-   .pm = _pltfr_pm_ops,
+   .pm = _dwmac_pm_ops,
.of_match_table = sti_dwmac_match,
},
 };
-- 
2.10.1

[net-next PATCH 2/7] stmmac: dwmac-sti: remove clk NULL checks

2016-10-30 Thread Joachim Eastwood

Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 075ed42..f009bf4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -191,7 +191,7 @@ static void stih4xx_fix_retime_src(void *priv, u32 spd)
}
}
 
-   if (src == TX_RETIME_SRC_CLKGEN && dwmac->clk && freq)
+   if (src == TX_RETIME_SRC_CLKGEN && freq)
clk_set_rate(dwmac->clk, freq);
 
regmap_update_bits(dwmac->regmap, reg, STIH4XX_RETIME_SRC_MASK,
@@ -222,7 +222,7 @@ static void stid127_fix_retime_src(void *priv, u32 spd)
freq = DWMAC_2_5MHZ;
}
 
-   if (dwmac->clk && freq)
+   if (freq)
clk_set_rate(dwmac->clk, freq);
 
regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val);
@@ -238,8 +238,7 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
u32 reg = dwmac->ctrl_reg;
u32 val;
 
-   if (dwmac->clk)
-   clk_prepare_enable(dwmac->clk);
+   clk_prepare_enable(dwmac->clk);
 
if (of_property_read_bool(np, "st,gmac_en"))
regmap_update_bits(regmap, reg, EN_MASK, EN);
@@ -258,8 +257,7 @@ static void sti_dwmac_exit(struct platform_device *pdev, 
void *priv)
 {
struct sti_dwmac *dwmac = priv;
 
-   if (dwmac->clk)
-   clk_disable_unprepare(dwmac->clk);
+   clk_disable_unprepare(dwmac->clk);
 }
 static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct platform_device *pdev)
-- 
2.10.1

[net-next PATCH 1/7] stmmac: dwmac-sti: remove useless of_node check

2016-10-30 Thread Joachim Eastwood

Since dwmac-sti is a DT only driver checking for OF node is not necessary.

Signed-off-by: Joachim Eastwood 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 58c05ac..075ed42 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -270,9 +270,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct regmap *regmap;
int err;
 
-   if (!np)
-   return -EINVAL;
-
/* clk selection from extra syscfg register */
dwmac->clk_sel_reg = -ENXIO;
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "sti-clkconf");
-- 
2.10.1

[net-next PATCH 0/7] stmmac: dwmac-sti refactor+cleanup

2016-10-30 Thread Joachim Eastwood

This patch set aims to remove the init/exit callbacks from the 
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.

Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.

Note that this patch set has only been test compiled and no functional
change is intended.

Joachim Eastwood (7):
  stmmac: dwmac-sti: remove useless of_node check
  stmmac: dwmac-sti: remove clk NULL checks
  stmmac: dwmac-sti: add PM ops and resume function
  stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data
  stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
  stmmac: dwmac-sti: clean up and rename sti_dwmac_init
  stmmac: dwmac-sti: remove unused priv dev member

 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 86 -
 1 file changed, 57 insertions(+), 29 deletions(-)

-- 
2.10.1

Re: Let's do P4

2016-10-30 Thread Jiri Pirko

Sun, Oct 30, 2016 at 07:44:43PM CET, kubak...@wp.pl wrote:
>On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote:
>> Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote:
>> >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:  
>> >> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:  
>>  [...]  
>>  [...]  
>> >>  [...]  
>> >>  [...]  
>> >>  [...]  
>> >>  [...]
>>  [...]  
>> >> 
>> >> Agreed.  
>> >
>> >Just to clarify my intention here was not to suggest the use of eBPF as
>> >the IR.  I was merely cautioning against bundling the new API with P4,
>> >for multiple reasons.  As John mentioned P4 spec was evolving in the
>> >past.  The spec is designed for HW more capable than the switch ASICs we
>> >have today.  As vendors move to provide more configurability we may need
>> >to extend the API beyond P4.  We may want to extend this API to for SW
>> >hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
>> >John showed examples of matchd software which already uses P4 at the
>> >frontend today and translates it to different targets (eBPF, u32, HW).
>> >It may just be about the naming but I feel like calling the new API
>> >more generically, switch AST or some such may help to avoid unnecessary
>> >ties and confusion.  
>> 
>> Well, that basically means to create "something" that could be be used
>> to translate p4 source to. Not sure how exactly this "something" should
>> look like and how different would it be from p4. I thought it might
>> be good to benefit from the p4 definition and use it directly. Not sure.
>
>We have to translate the P4 into "something" already, that something
>is the AST we will load into the kernel.  Or were you planning to use
>some official P4 AST?  I'm not suggesting we add our own high level

I'm not aware of existence of some official P4 AST. We have to figure it
out.


>language.  I agree that P4 is a good starting point, and perhaps a good
>high level language.  I'm just cautious of creating an equivalency
>between high level language (P4) and the kernel ABI.

Understood. Definitelly good to be very cautious when defining a kernel
API.


>
>Perhaps I'm just wasting everyone's time with this.
>
>> >> 
>> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
>> >> 
>> >>  |
>> >>  |   +--> ebpf engine
>> >>  |   |
>> >>  |   |
>> >>  |   compilerB
>> >>  |   ^
>> >>  |   |
>> >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC 
>> >> -> HW
>> >>  |
>> >>userspace | kernel
>> >>  |
>> >>
>> >> Now please consider runtime API for rule insertion/removal/stats/etc.
>> >> Also, the single API is cls_p4 here:
>> >> 
>> >> |
>> >> |
>> >> |
>> >> |   
>> >> |ebpf map fillup
>> >> |   ^
>> >> |   |
>> >>  p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
>> >> |
>> >>   userspace | kernel
>> >>   
>> >
>> >My understanding was that the main purpose of SW eBPF translation would
>> >be to piggy back on eBPF userspace map API.  This seems not to be the
>> >case here?  Is "P4 rule" being added via some new API?  From performance  
>> 
>> cls_p4 TC classifier.
>
>Oh, so the cls_p4 is just a proxy forwarding the requests to drivers
>or eBPF backend.  Got it.  Sorry for being slow.  And the requests
>come down via change() op or something new?  I wonder how such scheme
>compares to eBPF maps performance-wise (updates/sec).

I have no numbers at this time. I guess Jamal and Alexei did some
measurements in this are in the past.


>
>> >perspective the SW AST implementation would probably not be any slower
>> >than u32, so I don't think we need eBPF for performance.  I must be
>> >misreading this, if we want eBPF fallback we must extend eBPF with all
>> >the map types anyway... so we could just use eBPF map API?  I believe
>> >John has already done some work in this space (see his GitHub :))  
>> 
>> I don't think you can use existing BPF maps kernel API. You would still
>> have to have another API just for the offloaded datapath. And that is
>> a bypass. I strongly believe we need a single kernel API for both
>> SW and HW datapath setup and runtime configuration.
>
>Agreed, single API is a must.  What is the HW characteristic which
>doesn't fit with eBPF map API, though?  For eBPF offload I was planning
>on adding offload hooks

Re: Let's do P4

2016-10-30 Thread Jakub Kicinski

On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote:
> Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote:
> >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:  
> >> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:  
>  [...]  
>  [...]  
> >>  [...]  
> >>  [...]  
> >>  [...]  
> >>  [...]
>  [...]  
> >> 
> >> Agreed.  
> >
> >Just to clarify my intention here was not to suggest the use of eBPF as
> >the IR.  I was merely cautioning against bundling the new API with P4,
> >for multiple reasons.  As John mentioned P4 spec was evolving in the
> >past.  The spec is designed for HW more capable than the switch ASICs we
> >have today.  As vendors move to provide more configurability we may need
> >to extend the API beyond P4.  We may want to extend this API to for SW
> >hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
> >John showed examples of matchd software which already uses P4 at the
> >frontend today and translates it to different targets (eBPF, u32, HW).
> >It may just be about the naming but I feel like calling the new API
> >more generically, switch AST or some such may help to avoid unnecessary
> >ties and confusion.  
> 
> Well, that basically means to create "something" that could be be used
> to translate p4 source to. Not sure how exactly this "something" should
> look like and how different would it be from p4. I thought it might
> be good to benefit from the p4 definition and use it directly. Not sure.

We have to translate the P4 into "something" already, that something
is the AST we will load into the kernel.  Or were you planning to use
some official P4 AST?  I'm not suggesting we add our own high level
language.  I agree that P4 is a good starting point, and perhaps a good
high level language.  I'm just cautious of creating an equivalency
between high level language (P4) and the kernel ABI.

Perhaps I'm just wasting everyone's time with this.

> >> 
> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
> >> 
> >>  |
> >>  |   +--> ebpf engine
> >>  |   |
> >>  |   |
> >>  |   compilerB
> >>  |   ^
> >>  |   |
> >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC 
> >> -> HW
> >>  |
> >>userspace | kernel
> >>  |
> >>
> >> Now please consider runtime API for rule insertion/removal/stats/etc.
> >> Also, the single API is cls_p4 here:
> >> 
> >> |
> >> |
> >> |
> >> |   
> >> |ebpf map fillup
> >> |   ^
> >> |   |
> >>  p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
> >> |
> >>   userspace | kernel
> >>   
> >
> >My understanding was that the main purpose of SW eBPF translation would
> >be to piggy back on eBPF userspace map API.  This seems not to be the
> >case here?  Is "P4 rule" being added via some new API?  From performance  
> 
> cls_p4 TC classifier.

Oh, so the cls_p4 is just a proxy forwarding the requests to drivers
or eBPF backend.  Got it.  Sorry for being slow.  And the requests
come down via change() op or something new?  I wonder how such scheme
compares to eBPF maps performance-wise (updates/sec).

> >perspective the SW AST implementation would probably not be any slower
> >than u32, so I don't think we need eBPF for performance.  I must be
> >misreading this, if we want eBPF fallback we must extend eBPF with all
> >the map types anyway... so we could just use eBPF map API?  I believe
> >John has already done some work in this space (see his GitHub :))  
> 
> I don't think you can use existing BPF maps kernel API. You would still
> have to have another API just for the offloaded datapath. And that is
> a bypass. I strongly believe we need a single kernel API for both
> SW and HW datapath setup and runtime configuration.

Agreed, single API is a must.  What is the HW characteristic which
doesn't fit with eBPF map API, though?  For eBPF offload I was planning
on adding offload hooks on eBPF map lookup/update paths and a way of
associating the map with a netdev.  This should be enough to forward
updates to the driver and intercept reads to return the right
statistics.

Re: Let's do P4

2016-10-30 Thread Jiri Pirko

Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote:
>On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:
>> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:
>> >On 10/30/16 at 08:44am, Jiri Pirko wrote:  
>> >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:  
>>  [...]  
>>  [...]  
>>  [...]  
>>  [...]  
>> >
>> >My assumption was that a new IR is defined which is easier to parse than
>> >eBPF which is targeted at execution on a CPU and not indented for pattern
>> >matching. Just looking at how llvm creates different patterns and reorders
>> >instructions, I'm not seeing how eBPF can serve as a general purpose IR
>> >if the objective is to allow fairly flexible generation of the bytecode.
>> >Hence the alternative IR serving as additional metadata complementing the
>> >eBPF program.  
>> 
>> Agreed.
>
>Just to clarify my intention here was not to suggest the use of eBPF as
>the IR.  I was merely cautioning against bundling the new API with P4,
>for multiple reasons.  As John mentioned P4 spec was evolving in the
>past.  The spec is designed for HW more capable than the switch ASICs we
>have today.  As vendors move to provide more configurability we may need
>to extend the API beyond P4.  We may want to extend this API to for SW
>hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
>John showed examples of matchd software which already uses P4 at the
>frontend today and translates it to different targets (eBPF, u32, HW).
>It may just be about the naming but I feel like calling the new API
>more generically, switch AST or some such may help to avoid unnecessary
>ties and confusion.

Well, that basically means to create "something" that could be be used
to translate p4 source to. Not sure how exactly this "something" should
look like and how different would it be from p4. I thought it might
be good to benefit from the p4 definition and use it directly. Not sure.


>
>> >I understand what you mean with two APIs now. You want a single IR
>> >block and divide the SW/HW part in the kernel rather than let llvm or
>> >something else do it.  
>> 
>> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
>> 
>>  |
>>  |   +--> ebpf engine
>>  |   |
>>  |   |
>>  |   compilerB
>>  |   ^
>>  |   |
>> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> 
>> HW
>>  |
>>userspace | kernel
>>  |
>>
>> Now please consider runtime API for rule insertion/removal/stats/etc.
>> Also, the single API is cls_p4 here:
>> 
>> |
>> |
>> |
>> |   
>> |ebpf map fillup
>> |   ^
>> |   |
>>  p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
>> |
>>   userspace | kernel
>> 
>
>My understanding was that the main purpose of SW eBPF translation would
>be to piggy back on eBPF userspace map API.  This seems not to be the
>case here?  Is "P4 rule" being added via some new API?  From performance

cls_p4 TC classifier.


>perspective the SW AST implementation would probably not be any slower
>than u32, so I don't think we need eBPF for performance.  I must be
>misreading this, if we want eBPF fallback we must extend eBPF with all
>the map types anyway... so we could just use eBPF map API?  I believe
>John has already done some work in this space (see his GitHub :))

I don't think you can use existing BPF maps kernel API. You would still
have to have another API just for the offloaded datapath. And that is
a bypass. I strongly believe we need a single kernel API for both
SW and HW datapath setup and runtime configuration.


>
>As for AST -> eBPF translator in the kernel, IMHO it could be very
>useful.  Since all the drivers will have to implement translators
>anyway, the eBPF translator may help to build a good shared
>infrastructure.  I mean - it could be a starting place for sharing code
>between drivers if done properly.

Agreed.


>
>> >> Well for hw offload, every driver has to parse the IR (whatever will it
>> >> be in) and program HW accordingly. Similar parsing and translation would
>> >> be needed for SW path, to translate into eBPF. I don't think it would be
>> >> more complex than in the drivers. Should be fine.  
>> >
>> >I'm not sure I see why anyone would ever want to use an IR for SW
>> >purposes which is restricted to the lowest common denominator of

Re: Let's do P4

2016-10-30 Thread Jakub Kicinski

On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:
> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:
> >On 10/30/16 at 08:44am, Jiri Pirko wrote:  
> >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> >
> >My assumption was that a new IR is defined which is easier to parse than
> >eBPF which is targeted at execution on a CPU and not indented for pattern
> >matching. Just looking at how llvm creates different patterns and reorders
> >instructions, I'm not seeing how eBPF can serve as a general purpose IR
> >if the objective is to allow fairly flexible generation of the bytecode.
> >Hence the alternative IR serving as additional metadata complementing the
> >eBPF program.  
> 
> Agreed.

Just to clarify my intention here was not to suggest the use of eBPF as
the IR.  I was merely cautioning against bundling the new API with P4,
for multiple reasons.  As John mentioned P4 spec was evolving in the
past.  The spec is designed for HW more capable than the switch ASICs we
have today.  As vendors move to provide more configurability we may need
to extend the API beyond P4.  We may want to extend this API to for SW
hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
John showed examples of matchd software which already uses P4 at the
frontend today and translates it to different targets (eBPF, u32, HW).
It may just be about the naming but I feel like calling the new API
more generically, switch AST or some such may help to avoid unnecessary
ties and confusion.

> >I understand what you mean with two APIs now. You want a single IR
> >block and divide the SW/HW part in the kernel rather than let llvm or
> >something else do it.  
> 
> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
> 
>  |
>  |   +--> ebpf engine
>  |   |
>  |   |
>  |   compilerB
>  |   ^
>  |   |
> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW
>  |
>userspace | kernel
>  |
>
> Now please consider runtime API for rule insertion/removal/stats/etc.
> Also, the single API is cls_p4 here:
> 
> |
> |
> |
> |   
> |ebpf map fillup
> |   ^
> |   |
>  p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
> |
>   userspace | kernel
> 

My understanding was that the main purpose of SW eBPF translation would
be to piggy back on eBPF userspace map API.  This seems not to be the
case here?  Is "P4 rule" being added via some new API?  From performance
perspective the SW AST implementation would probably not be any slower
than u32, so I don't think we need eBPF for performance.  I must be
misreading this, if we want eBPF fallback we must extend eBPF with all
the map types anyway... so we could just use eBPF map API?  I believe
John has already done some work in this space (see his GitHub :))

As for AST -> eBPF translator in the kernel, IMHO it could be very
useful.  Since all the drivers will have to implement translators
anyway, the eBPF translator may help to build a good shared
infrastructure.  I mean - it could be a starting place for sharing code
between drivers if done properly.

> >> Well for hw offload, every driver has to parse the IR (whatever will it
> >> be in) and program HW accordingly. Similar parsing and translation would
> >> be needed for SW path, to translate into eBPF. I don't think it would be
> >> more complex than in the drivers. Should be fine.  
> >
> >I'm not sure I see why anyone would ever want to use an IR for SW
> >purposes which is restricted to the lowest common denominator of HW.
> >A good example here is OpenFlow and how some of its SW consumers
> >have evolved with extensions which cannot be mappepd to HW easily.
> >The same seems to happen with P4 as it introduces the concept of
> >state and other concepts which are hard to map for dumb HW. P4 doesn't
> >magically solve this problem, the fundamental difference in
> >capabilities between HW and SW remain.
> >  
>  [...]  
>  [...]  
>  [...]  
> >> 
> >> Yeah, I was also thinking about something similar to your Flow-API,
> >> but we need something more generic I believe.
> >>   
>  [...]  
> >> 
> >> Btw, Flow-API was rejected because it was a clean kernel-bypass. In case
> >> of p4, if we do what Thomas is suggesting, having x.bpf for SW and
>

Re: [PATCH net-next 0/2] mlx4 XDP TX refactor

2016-10-30 Thread David Miller

From: Tariq Toukan 
Date: Sun, 30 Oct 2016 18:21:26 +0200

> Hi Dave,
> 
> This series makes Brenden's fix unneeded:
> 958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx")
> 
> The fix got into net, but yet to be in net-next.
> 
> Should I wait with this series and send a re-spin, with a revert of
> the fix, once it gets into net-next?

I'm working on merging net into net-next right now, once that is
pushed out you can respin.

[PATCHv2 net] sctp: return back transport in __sctp_rcv_init_lookup

2016-10-30 Thread Xin Long

Prior to this patch, it used a local variable to save the transport that is
looked up by __sctp_lookup_association(), and didn't return it back. But in
sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
even if it found the transport, it was still initializing chunk->transport
with null instead.

This patch is to return the transport back through transport pointer
that is from __sctp_rcv_lookup_harder().

Signed-off-by: Xin Long 
---
 net/sctp/input.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index a2ea1d1..8e0bc58 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -1021,7 +1021,6 @@ static struct sctp_association 
*__sctp_rcv_init_lookup(struct net *net,
struct sctphdr *sh = sctp_hdr(skb);
union sctp_params params;
sctp_init_chunk_t *init;
-   struct sctp_transport *transport;
struct sctp_af *af;
 
/*
@@ -1052,7 +1051,7 @@ static struct sctp_association 
*__sctp_rcv_init_lookup(struct net *net,
 
af->from_addr_param(paddr, params.addr, sh->source, 0);
 
-   asoc = __sctp_lookup_association(net, laddr, paddr, );
+   asoc = __sctp_lookup_association(net, laddr, paddr, transportp);
if (asoc)
return asoc;
}
-- 
2.1.0

[PATCH net-next 2/7] qed: Add nvram selftest

2016-10-30 Thread Yuval Mintz

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   4 +
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   1 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   |  94 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   |  41 ++
 drivers/net/ethernet/qlogic/qed/qed_selftest.c  | 101 
 drivers/net/ethernet/qlogic/qed/qed_selftest.h  |  10 +++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |   7 ++
 include/linux/qed/qed_if.h  |   9 +++
 8 files changed, 267 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 36de87a..f7dfa2e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8666,6 +8666,8 @@ struct public_drv_mb {
 
 #define DRV_MB_PARAM_BIST_REGISTER_TEST1
 #define DRV_MB_PARAM_BIST_CLOCK_TEST   2
+#define DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES  3
+#define DRV_MB_PARAM_BIST_NVM_TEST_IMAGE_BY_INDEX  4
 
 #define DRV_MB_PARAM_BIST_RC_UNKNOWN   0
 #define DRV_MB_PARAM_BIST_RC_PASSED1
@@ -8674,6 +8676,8 @@ struct public_drv_mb {
 
 #define DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT 0
 #define DRV_MB_PARAM_BIST_TEST_INDEX_MASK  0x00FF
+#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_SHIFT   8
+#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_MASK0xFF00
 
u32 fw_mb_header;
 #define FW_MSG_CODE_MASK   0x
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 937968b..612c094 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -1509,6 +1509,7 @@ struct qed_selftest_ops qed_selftest_ops_pass = {
.selftest_interrupt = _selftest_interrupt,
.selftest_register = _selftest_register,
.selftest_clock = _selftest_clock,
+   .selftest_nvram = _selftest_nvram,
 };
 
 const struct qed_common_ops qed_common_ops_pass = {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c 
b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 98dc913..8be6157 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1434,6 +1434,52 @@ int qed_mcp_mask_parities(struct qed_hwfn *p_hwfn,
return rc;
 }
 
+int qed_mcp_nvm_read(struct qed_dev *cdev, u32 addr, u8 *p_buf, u32 len)
+{
+   u32 bytes_left = len, offset = 0, bytes_to_copy, read_len = 0;
+   struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev);
+   u32 resp = 0, resp_param = 0;
+   struct qed_ptt *p_ptt;
+   int rc = 0;
+
+   p_ptt = qed_ptt_acquire(p_hwfn);
+   if (!p_ptt)
+   return -EBUSY;
+
+   while (bytes_left > 0) {
+   bytes_to_copy = min_t(u32, bytes_left, MCP_DRV_NVM_BUF_LEN);
+
+   rc = qed_mcp_nvm_rd_cmd(p_hwfn, p_ptt,
+   DRV_MSG_CODE_NVM_READ_NVRAM,
+   addr + offset +
+   (bytes_to_copy <<
+DRV_MB_PARAM_NVM_LEN_SHIFT),
+   , _param,
+   _len,
+   (u32 *)(p_buf + offset));
+
+   if (rc || (resp != FW_MSG_CODE_NVM_OK)) {
+   DP_NOTICE(cdev, "MCP command rc = %d\n", rc);
+   break;
+   }
+
+   /* This can be a lengthy process, and it's possible scheduler
+* isn't preemptable. Sleep a bit to prevent CPU hogging.
+*/
+   if (bytes_left % 0x1000 <
+   (bytes_left - read_len) % 0x1000)
+   usleep_range(1000, 2000);
+
+   offset += read_len;
+   bytes_left -= read_len;
+   }
+
+   cdev->mcp_nvm_resp = resp;
+   qed_ptt_release(p_hwfn, p_ptt);
+
+   return rc;
+}
+
 int qed_mcp_bist_register_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 {
u32 drv_mb_param = 0, rsp, param;
@@ -1475,3 +1521,51 @@ int qed_mcp_bist_clock_test(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
 
return rc;
 }
+
+int qed_mcp_bist_nvm_test_get_num_images(struct qed_hwfn *p_hwfn,
+struct qed_ptt *p_ptt,
+u32 *num_images)
+{
+   u32 drv_mb_param = 0, rsp;
+   int rc = 0;
+
+   drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES <<
+   DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT);
+
+   rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST,
+drv_mb_param, , num_images);
+   if (rc)
+   return rc;
+
+   if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK))
+   rc = -EINVAL;
+
+

[PATCH net-next 4/7] qede: Decouple ethtool caps from qed

2016-10-30 Thread Yuval Mintz

While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 42d9739..d230742 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -320,7 +320,7 @@ struct qede_link_mode_mapping {
 {  \
int i;  \
\
-   for (i = 0; i < QED_LM_COUNT; i++) {\
+   for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) {  \
if ((caps) & (qed_lm_map[i].qed_link_mode)) \
__set_bit(qed_lm_map[i].ethtool_link_mode,\
  lk_ksettings->link_modes.name); \
@@ -331,7 +331,7 @@ struct qede_link_mode_mapping {
 {  \
int i;  \
\
-   for (i = 0; i < QED_LM_COUNT; i++) {\
+   for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) {  \
if (test_bit(qed_lm_map[i].ethtool_link_mode,   \
 lk_ksettings->link_modes.name))\
caps |= qed_lm_map[i].qed_link_mode;\
-- 
1.9.3

[PATCH net-next 6/7] qed: Use VF-queue feature

2016-10-30 Thread Yuval Mintz

Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.

This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   |  1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 20 ++
 drivers/net/ethernet/qlogic/qed/qed_int.c   | 32 -
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 ++-
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 8828ffa..6d3013f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -174,6 +174,7 @@ enum QED_FEATURE {
QED_PF_L2_QUE,
QED_VF,
QED_RDMA_CNQ,
+   QED_VF_L2_QUE,
QED_MAX_FEATURES,
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 13833a5..b59da1a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1475,6 +1475,7 @@ static void get_function_id(struct qed_hwfn *p_hwfn)
 static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
 {
u32 *feat_num = p_hwfn->hw_info.feat_num;
+   struct qed_sb_cnt_info sb_cnt_info;
int num_features = 1;
 
 #if IS_ENABLED(CONFIG_INFINIBAND_QEDR)
@@ -1493,10 +1494,21 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
feat_num[QED_PF_L2_QUE] = min_t(u32, RESC_NUM(p_hwfn, QED_SB) /
num_features,
RESC_NUM(p_hwfn, QED_L2_QUEUE));
-   DP_VERBOSE(p_hwfn, NETIF_MSG_PROBE,
-  "#PF_L2_QUEUES=%d #SBS=%d num_features=%d\n",
-  feat_num[QED_PF_L2_QUE], RESC_NUM(p_hwfn, QED_SB),
-  num_features);
+
+   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   qed_int_get_num_sbs(p_hwfn, _cnt_info);
+   feat_num[QED_VF_L2_QUE] =
+   min_t(u32,
+ RESC_NUM(p_hwfn, QED_L2_QUEUE) -
+ FEAT_NUM(p_hwfn, QED_PF_L2_QUE), sb_cnt_info.sb_iov_cnt);
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_PROBE,
+  "#PF_L2_QUEUES=%d VF_L2_QUEUES=%d #ROCE_CNQ=%d #SBS=%d 
num_features=%d\n",
+  (int)FEAT_NUM(p_hwfn, QED_PF_L2_QUE),
+  (int)FEAT_NUM(p_hwfn, QED_VF_L2_QUE),
+  (int)FEAT_NUM(p_hwfn, QED_RDMA_CNQ),
+  RESC_NUM(p_hwfn, QED_SB), num_features);
 }
 
 static int qed_hw_get_resc(struct qed_hwfn *p_hwfn)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c 
b/drivers/net/ethernet/qlogic/qed/qed_int.c
index 2adedc6..bb74e1c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -3030,6 +3030,31 @@ int qed_int_igu_read_cam(struct qed_hwfn *p_hwfn, struct 
qed_ptt *p_ptt)
}
}
}
+
+   /* There's a possibility the igu_sb_cnt_iov doesn't properly reflect
+* the number of VF SBs [especially for first VF on engine, as we can't
+* diffrentiate between empty entries and its entries].
+* Since we don't really support more SBs than VFs today, prevent any
+* such configuration by sanitizing the number of SBs to equal the
+* number of VFs.
+*/
+   if (IS_PF_SRIOV(p_hwfn)) {
+   u16 total_vfs = p_hwfn->cdev->p_iov_info->total_vfs;
+
+   if (total_vfs < p_igu_info->free_blks) {
+   DP_VERBOSE(p_hwfn,
+  (NETIF_MSG_INTR | QED_MSG_IOV),
+  "Limiting number of SBs for IOV - %04x --> 
%04x\n",
+  p_igu_info->free_blks,
+  p_hwfn->cdev->p_iov_info->total_vfs);
+   p_igu_info->free_blks = total_vfs;
+   } else if (total_vfs > p_igu_info->free_blks) {
+   DP_NOTICE(p_hwfn,
+ "IGU has only %04x SBs for VFs while the 
device has %04x VFs\n",
+ p_igu_info->free_blks, total_vfs);
+   return -EINVAL;
+   }
+   }
p_igu_info->igu_sb_cnt_iov = p_igu_info->free_blks;
 
DP_VERBOSE(
@@ -3163,7 +3188,12 @@ u16 qed_int_queue_id_from_sb_id(struct qed_hwfn *p_hwfn, 
u16 sb_id)
return sb_id - p_info->igu_base_sb;
} else if ((sb_id >= p_info->igu_base_sb_iov) &&
   (sb_id <

[PATCH net-next 3/7] qed*: Add support for WoL

2016-10-30 Thread Yuval Mintz

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   | 11 -
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 19 -
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |  4 ++
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 29 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 56 -
 drivers/net/ethernet/qlogic/qede/qede.h |  2 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 41 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c|  9 
 include/linux/qed/qed_if.h  | 10 +
 9 files changed, 176 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index f20243c..8828ffa 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -195,6 +195,11 @@ enum qed_dev_cap {
QED_DEV_CAP_ROCE,
 };
 
+enum qed_wol_support {
+   QED_WOL_SUPPORT_NONE,
+   QED_WOL_SUPPORT_PME,
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -227,6 +232,8 @@ struct qed_hw_info {
u32 hw_mode;
unsigned long   device_capabilities;
u16 mtu;
+
+   enum qed_wol_support b_wol_support;
 };
 
 struct qed_hw_cid_data {
@@ -539,7 +546,9 @@ struct qed_dev {
u8  mcp_rev;
u8  boot_mode;
 
-   u8  wol;
+   /* WoL related configurations */
+   u8 wol_config;
+   u8 wol_mac[ETH_ALEN];
 
u32 int_mode;
enum qed_coalescing_modeint_coalescing_mode;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 9ef6dfd..13833a5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1363,8 +1363,24 @@ int qed_hw_reset(struct qed_dev *cdev)
 {
int rc = 0;
u32 unload_resp, unload_param;
+   u32 wol_param;
int i;
 
+   switch (cdev->wol_config) {
+   case QED_OV_WOL_DISABLED:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_DISABLED;
+   break;
+   case QED_OV_WOL_ENABLED:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_ENABLED;
+   break;
+   default:
+   DP_NOTICE(cdev,
+ "Unknown WoL configuration %02x\n", cdev->wol_config);
+   /* Fallthrough */
+   case QED_OV_WOL_DEFAULT:
+   wol_param = DRV_MB_PARAM_UNLOAD_WOL_MCP;
+   }
+
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
@@ -1393,8 +1409,7 @@ int qed_hw_reset(struct qed_dev *cdev)
 
/* Send unload command to MCP */
rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt,
-DRV_MSG_CODE_UNLOAD_REQ,
-DRV_MB_PARAM_UNLOAD_WOL_MCP,
+DRV_MSG_CODE_UNLOAD_REQ, wol_param,
 _resp, _param);
if (rc) {
DP_NOTICE(p_hwfn, "qed_hw_reset: UNLOAD_REQ failed\n");
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index f7dfa2e..fdb7a09 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8601,6 +8601,7 @@ struct public_drv_mb {
 
 #define DRV_MSG_CODE_BIST_TEST 0x001e
 #define DRV_MSG_CODE_SET_LED_MODE  0x0020
+#define DRV_MSG_CODE_OS_WOL0x002e
 
 #define DRV_MSG_SEQ_NUMBER_MASK0x
 
@@ -8697,6 +8698,9 @@ struct public_drv_mb {
 #define FW_MSG_CODE_NVM_OK 0x0001
 #define FW_MSG_CODE_OK 0x0016
 
+#define FW_MSG_CODE_OS_WOL_SUPPORTED0x0080
+#define FW_MSG_CODE_OS_WOL_NOT_SUPPORTED0x0081
+
 #define FW_MSG_SEQ_NUMBER_MASK 0x
 
u32 fw_mb_param;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 612c094..a95a1af 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -223,6 +223,10 @@ int qed_fill_dev_info(struct qed_dev *cdev,
dev_info->fw_eng = FW_ENGINEERING_VERSION;
dev_info->mf_mode = cdev->mf_mode;
dev_info->tx_switching = true;
+
+   if (QED_LEADING_HWFN(cdev)->hw_info.b_wol_support ==
+   QED_WOL_SUPPORT_PME)
+   dev_info->wol_support = true;
} else {
qed_vf_get_fw_version(>hwfns[0], _info->fw_major,

[PATCH net-next 5/7] qed: Learn of RDMA capabilities per-device

2016-10-30 Thread Yuval Mintz

Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].

Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  7 +++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 78 +++
 2 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h 
b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index fdb7a09..1d113ce 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -8601,6 +8601,7 @@ struct public_drv_mb {
 
 #define DRV_MSG_CODE_BIST_TEST 0x001e
 #define DRV_MSG_CODE_SET_LED_MODE  0x0020
+#define DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL  0x002b
 #define DRV_MSG_CODE_OS_WOL0x002e
 
 #define DRV_MSG_SEQ_NUMBER_MASK0x
@@ -8705,6 +8706,12 @@ struct public_drv_mb {
 
u32 fw_mb_param;
 
+   /* get pf rdma protocol command responce */
+#define FW_MB_PARAM_GET_PF_RDMA_NONE   0x0
+#define FW_MB_PARAM_GET_PF_RDMA_ROCE   0x1
+#define FW_MB_PARAM_GET_PF_RDMA_IWARP  0x2
+#define FW_MB_PARAM_GET_PF_RDMA_BOTH   0x3
+
u32 drv_pulse_mb;
 #define DRV_PULSE_SEQ_MASK 0x7fff
 #define DRV_PULSE_SYSTEM_TIME_MASK 0x
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c 
b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 768b35b..0927488 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1024,28 +1024,89 @@ int qed_mcp_get_media_type(struct qed_dev *cdev, u32 
*p_media_type)
return 0;
 }
 
+/* Old MFW has a global configuration for all PFs regarding RDMA support */
+static void
+qed_mcp_get_shmem_proto_legacy(struct qed_hwfn *p_hwfn,
+  enum qed_pci_personality *p_proto)
+{
+   /* There wasn't ever a legacy MFW that published iwarp.
+* So at this point, this is either plain l2 or RoCE.
+*/
+   if (test_bit(QED_DEV_CAP_ROCE, _hwfn->hw_info.device_capabilities))
+   *p_proto = QED_PCI_ETH_ROCE;
+   else
+   *p_proto = QED_PCI_ETH;
+
+   DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP,
+  "According to Legacy capabilities, L2 personality is %08x\n",
+  (u32) *p_proto);
+}
+
+static int
+qed_mcp_get_shmem_proto_mfw(struct qed_hwfn *p_hwfn,
+   struct qed_ptt *p_ptt,
+   enum qed_pci_personality *p_proto)
+{
+   u32 resp = 0, param = 0;
+   int rc;
+
+   rc = qed_mcp_cmd(p_hwfn, p_ptt,
+DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL, 0, , );
+   if (rc)
+   return rc;
+   if (resp != FW_MSG_CODE_OK) {
+   DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP,
+  "MFW lacks support for command; Returns %08x\n",
+  resp);
+   return -EINVAL;
+   }
+
+   switch (param) {
+   case FW_MB_PARAM_GET_PF_RDMA_NONE:
+   *p_proto = QED_PCI_ETH;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_ROCE:
+   *p_proto = QED_PCI_ETH_ROCE;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_BOTH:
+   DP_NOTICE(p_hwfn,
+ "Current day drivers don't support RoCE & iWARP. 
Default to RoCE-only\n");
+   *p_proto = QED_PCI_ETH_ROCE;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_IWARP:
+   default:
+   DP_NOTICE(p_hwfn,
+ "MFW answers GET_PF_RDMA_PROTOCOL but param is 
%08x\n",
+ param);
+   return -EINVAL;
+   }
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_IFUP,
+  "According to capabilities, L2 personality is %08x [resp 
%08x param %08x]\n",
+  (u32) *p_proto, resp, param);
+   return 0;
+}
+
 static int
 qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,
struct public_func *p_info,
+   struct qed_ptt *p_ptt,
enum qed_pci_personality *p_proto)
 {
int rc = 0;
 
switch (p_info->config & FUNC_MF_CFG_PROTOCOL_MASK) {
case FUNC_MF_CFG_PROTOCOL_ETHERNET:
-   if (test_bit(QED_DEV_CAP_ROCE,
-_hwfn->hw_info.device_capabilities))
-   *p_proto = QED_PCI_ETH_ROCE;
-   else
-   *p_proto = QED_PCI_ETH;
+   if (qed_mcp_get_shmem_proto_mfw(p_hwfn, p_ptt, p_proto))
+

[PATCH net-next 7/7] qed: Learn resources from management firmware

2016-10-30 Thread Yuval Mintz

From: Tomer Tayar 

Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.

Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].

Signed-off-by: Tomer Tayar 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 291 --
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  46 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |  42 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |  15 ++
 include/linux/qed/qed_eth_if.h|   2 +-
 7 files changed, 341 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 6d3013f..50b8a01 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -154,7 +154,10 @@ struct qed_qm_iids {
u32 tids;
 };
 
-enum QED_RESOURCES {
+/* HW / FW resources, output of features supported below, most information
+ * is received from MFW.
+ */
+enum qed_resources {
QED_SB,
QED_L2_QUEUE,
QED_VPORT,
@@ -166,6 +169,7 @@ enum QED_RESOURCES {
QED_RDMA_CNQ_RAM,
QED_ILT,
QED_LL2_QUEUE,
+   QED_CMDQS_CQS,
QED_RDMA_STATS_QUEUE,
QED_MAX_RESC,
 };
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index b59da1a..edd9ad0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1511,47 +1511,240 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)
   RESC_NUM(p_hwfn, QED_SB), num_features);
 }
 
-static int qed_hw_get_resc(struct qed_hwfn *p_hwfn)
+static enum resource_id_enum qed_hw_get_mfw_res_id(enum qed_resources res_id)
+{
+   enum resource_id_enum mfw_res_id = RESOURCE_NUM_INVALID;
+
+   switch (res_id) {
+   case QED_SB:
+   mfw_res_id = RESOURCE_NUM_SB_E;
+   break;
+   case QED_L2_QUEUE:
+   mfw_res_id = RESOURCE_NUM_L2_QUEUE_E;
+   break;
+   case QED_VPORT:
+   mfw_res_id = RESOURCE_NUM_VPORT_E;
+   break;
+   case QED_RSS_ENG:
+   mfw_res_id = RESOURCE_NUM_RSS_ENGINES_E;
+   break;
+   case QED_PQ:
+   mfw_res_id = RESOURCE_NUM_PQ_E;
+   break;
+   case QED_RL:
+   mfw_res_id = RESOURCE_NUM_RL_E;
+   break;
+   case QED_MAC:
+   case QED_VLAN:
+   /* Each VFC resource can accommodate both a MAC and a VLAN */
+   mfw_res_id = RESOURCE_VFC_FILTER_E;
+   break;
+   case QED_ILT:
+   mfw_res_id = RESOURCE_ILT_E;
+   break;
+   case QED_LL2_QUEUE:
+   mfw_res_id = RESOURCE_LL2_QUEUE_E;
+   break;
+   case QED_RDMA_CNQ_RAM:
+   case QED_CMDQS_CQS:
+   /* CNQ/CMDQS are the same resource */
+   mfw_res_id = RESOURCE_CQS_E;
+   break;
+   case QED_RDMA_STATS_QUEUE:
+   mfw_res_id = RESOURCE_RDMA_STATS_QUEUE_E;
+   break;
+   default:
+   break;
+   }
+
+   return mfw_res_id;
+}
+
+static u32 qed_hw_get_dflt_resc_num(struct qed_hwfn *p_hwfn,
+   enum qed_resources res_id)
 {
-   u8 enabled_func_idx = p_hwfn->enabled_func_idx;
-   u32 *resc_start = p_hwfn->hw_info.resc_start;
u8 num_funcs = p_hwfn->num_funcs_on_engine;
-   u32 *resc_num = p_hwfn->hw_info.resc_num;
struct qed_sb_cnt_info sb_cnt_info;
-   int i, max_vf_vlan_filters;
+   u32 dflt_resc_num = 0;
 
-   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   switch (res_id) {
+   case QED_SB:
+   memset(_cnt_info, 0, sizeof(sb_cnt_info));
+   qed_int_get_num_sbs(p_hwfn, _cnt_info);
+   dflt_resc_num = sb_cnt_info.sb_cnt;
+   break;
+   case QED_L2_QUEUE:
+   dflt_resc_num = MAX_NUM_L2_QUEUES_BB / num_funcs;
+   break;
+   case QED_VPORT:
+   dflt_resc_num = MAX_NUM_VPORTS_BB / num_funcs;
+   break;
+   case QED_RSS_ENG:
+   dflt_resc_num = ETH_RSS_ENGINE_NUM_BB / num_funcs;
+   break;
+   case QED_PQ:
+   /* The granularity of the PQs is 8 */
+   dflt_resc_num = MAX_QM_TX_QUEUES_BB / num_funcs;
+   dflt_resc_num &= ~0x7;
+   break;
+   case QED_RL:
+   dflt_resc_num =

[PATCH net-next 0/7] qed*: Patch series

2016-10-30 Thread Yuval Mintz

This series does several things. The bigger changes:

 - Add new notification APIs [& Defaults] for various fields.
The series then utilizes some of those qed <-> qede APIs to bass WoL
support upon.

 - Change the resource allocation scheme to receive the values from
management firmware, instead of equally sharing resources between
functions [that might not need those]. That would, e.g., allow us to
configure additional filters to network interfaces in presence of
storage [PCI] functions from same adapter.

Dave,

Please consider applying this series to `net-next'.

Thanks,
Yuval

Sudarsana Kalluru (1):
  qed*: Management firmware - notifications and defaults

Tomer Tayar (1):
  qed: Learn resources from management firmware

Yuval Mintz (5):
  qed: Add nvram selftest
  qed*: Add support for WoL
  qede: Decouple ethtool caps from qed
  qed: Learn of RDMA capabilities per-device
  qed: Use VF-queue feature

 drivers/net/ethernet/qlogic/qed/qed.h   |  19 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c   | 382 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   | 120 ++-
 drivers/net/ethernet/qlogic/qed/qed_int.c   |  32 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c|   2 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 105 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 433 +++-
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   | 158 +
 drivers/net/ethernet/qlogic/qed/qed_selftest.c  | 101 ++
 drivers/net/ethernet/qlogic/qed/qed_selftest.h  |  10 +
 drivers/net/ethernet/qlogic/qed/qed_sriov.c |  17 +-
 drivers/net/ethernet/qlogic/qede/qede.h |   2 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  54 ++-
 drivers/net/ethernet/qlogic/qede/qede_main.c|  17 +
 include/linux/qed/qed_eth_if.h  |   2 +-
 include/linux/qed/qed_if.h  |  47 +++
 16 files changed, 1404 insertions(+), 97 deletions(-)

-- 
1.9.3

[PATCH net-next 1/7] qed*: Management firmware - notifications and defaults

2016-10-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.

This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c   |  52 +++-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |  59 -
 drivers/net/ethernet/qlogic/qed/qed_main.c  |  75 +++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c   | 163 
 drivers/net/ethernet/qlogic/qed/qed_mcp.h   | 102 +++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c|   8 ++
 include/linux/qed/qed_if.h  |  28 
 9 files changed, 487 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 653bb57..f20243c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -226,6 +226,7 @@ struct qed_hw_info {
u32 port_mode;
u32 hw_mode;
unsigned long   device_capabilities;
+   u16 mtu;
 };
 
 struct qed_hw_cid_data {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 754f6a9..9ef6dfd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1056,8 +1056,10 @@ int qed_hw_init(struct qed_dev *cdev,
bool allow_npar_tx_switch,
const u8 *bin_fw_data)
 {
-   u32 load_code, param;
-   int rc, mfw_rc, i;
+   u32 load_code, param, drv_mb_param;
+   bool b_default_mtu = true;
+   struct qed_hwfn *p_hwfn;
+   int rc = 0, mfw_rc, i;
 
if ((int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) {
DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n");
@@ -1073,6 +1075,12 @@ int qed_hw_init(struct qed_dev *cdev,
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
+   /* If management didn't provide a default, set one of our own */
+   if (!p_hwfn->hw_info.mtu) {
+   p_hwfn->hw_info.mtu = 1500;
+   b_default_mtu = false;
+   }
+
if (IS_VF(cdev)) {
p_hwfn->b_int_enabled = 1;
continue;
@@ -1156,6 +1164,38 @@ int qed_hw_init(struct qed_dev *cdev,
p_hwfn->hw_init_done = true;
}
 
+   if (IS_PF(cdev)) {
+   p_hwfn = QED_LEADING_HWFN(cdev);
+   drv_mb_param = (FW_MAJOR_VERSION << 24) |
+  (FW_MINOR_VERSION << 16) |
+  (FW_REVISION_VERSION << 8) |
+  (FW_ENGINEERING_VERSION);
+   rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt,
+DRV_MSG_CODE_OV_UPDATE_STORM_FW_VER,
+drv_mb_param, _code, );
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update firmware version\n");
+
+   if (!b_default_mtu) {
+   rc = qed_mcp_ov_update_mtu(p_hwfn, p_hwfn->p_main_ptt,
+  p_hwfn->hw_info.mtu);
+   if (rc)
+   DP_INFO(p_hwfn,
+   "Failed to update default mtu\n");
+   }
+
+   rc = qed_mcp_ov_update_driver_state(p_hwfn,
+   p_hwfn->p_main_ptt,
+ QED_OV_DRIVER_STATE_DISABLED);
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update driver state\n");
+
+   rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt,
+  QED_OV_ESWITCH_VEB);
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to update eswitch mode\n");
+   }
+
return 0;
 }
 
@@ -1800,6 +1840,9 @@ static void qed_get_num_funcs(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
 
qed_get_num_funcs(p_hwfn, p_ptt);
 
+   if (qed_mcp_is_init(p_hwfn))
+   p_hwfn->hw_info.mtu = p_hwfn->mcp_info->func_info.mtu;
+
return

Re: Let's do P4

2016-10-30 Thread Jiri Pirko

Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote:
>On 10/30/16 at 08:44am, Jiri Pirko wrote:
>> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:
>> >On 16-10-29 07:49 AM, Jakub Kicinski wrote:
>> >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote:
>> >>> Hi all.
>> >>>
>> >>> The network world is divided into 2 general types of hw:
>> >>> 1) network ASICs - network specific silicon, containing things like TCAM
>> >>>These ASICs are suitable to be programmed by P4.
>> >>> 2) network processors - basically a general purpose CPUs
>> >>>These processors are suitable to be programmed by eBPF.
>> >>>
>> >>> I believe that by now, the most people came to a conclusion that it is
>> >>> very difficult to handle both types by either P4 or eBPF. And since
>> >>> eBPF is part of the kernel, I would like to introduce P4 into kernel
>> >>> as well. Here's a plan:
>> >>>
>> >>> 1) Define P4 intermediate representation
>> >>>I cannot imagine loading P4 program (c-like syntax text file) into
>> >>>kernel as is. That means that as the first step, we need find some
>> >>>intermediate representation. I can imagine someting in a form of AST,
>> >>>call it "p4ast". I don't really know how to do this exactly though,
>> >>>it's just an idea.
>> >>>
>> >>>In the end there would be a userspace precompiler for this:
>> >>>$ makep4ast example.p4 example.ast
>> >> 
>> >> Maybe stating the obvious, but IMHO defining the IR is the hardest part.
>> >> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF.  The
>> >> AST/IR for switch pipelines should allow for similar flexibility.
>> >> Looser coupling would also protect us from changes in spec of the high
>> >> level language.
>
>My assumption was that a new IR is defined which is easier to parse than
>eBPF which is targeted at execution on a CPU and not indented for pattern
>matching. Just looking at how llvm creates different patterns and reorders
>instructions, I'm not seeing how eBPF can serve as a general purpose IR
>if the objective is to allow fairly flexible generation of the bytecode.
>Hence the alternative IR serving as additional metadata complementing the
>eBPF program.

Agreed.


[...]

>> >... And merging threads here with Jiri's email ...
>> >
>> >> If you do p4>ebpf in userspace, you have 2 apis:
>> >> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel
>> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>> >> 
>> >> Those are 2 apis. Both wrapped up by TC, but still 2 apis.
>> >> 
>> >> What I believe is correct is to have one api:
>> >> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel
>> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>
>I understand what you mean with two APIs now. You want a single IR
>block and divide the SW/HW part in the kernel rather than let llvm or
>something else do it.

Exactly. Following drawing shows p4 pipeline setup for SW and Hw:

 |
 |   +--> ebpf engine
 |   |
 |   |
 |   compilerB
 |   ^
 |   |
p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW
 |
   userspace | kernel
 |


Now please consider runtime API for rule insertion/removal/stats/etc.
Also, the single API is cls_p4 here:

|
|
|
|   
|ebpf map fillup
|   ^
|   |
 p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
|
  userspace | kernel




>
>> >Couple comments around this, first adding yet another IR in the kernel
>> >and another JIT engine to map that IR on to eBPF or hardware vendor X
>> >doesn't get me excited. Its really much easier to write these as backend
>> >objects in LLVM. Not saying it can't be done just saying it is easier
>> >in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF.
>> >In the end this would be a reasonably complex bit of code in
>> >the kernel only for hardware offload. I have doubts that folks would
>> >ever use it for software only cases. I'm happy to admit I'm wrong here
>> >though.
>> 
>> Well for hw offload, every driver has to parse the IR (whatever will it
>> be in) and program HW accordingly. Similar parsing and translation would
>> be needed for SW path, to translate into eBPF. I don't think it would be
>> more complex than in the drivers. Should be fine.
>
>I'm not sure

Re: [PATCH net-next 0/2] mlx4 XDP TX refactor

2016-10-30 Thread Tariq Toukan


Hi Dave,

This series makes Brenden's fix unneeded:
958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx")

The fix got into net, but yet to be in net-next.

Should I wait with this series and send a re-spin, with a revert of the 
fix, once it gets into net-next?


Regards,
Tariq


On 27/10/2016 5:52 PM, Tariq Toukan wrote:

Hi Dave,

This patchset refactors the XDP forwarding case, so that
its dedicated transmit queues are managed in a complete
separation from the other regular ones.

Series generated against net-next commit:
6edf10173a1f "devlink: Prevent port_type_set() callback when it's not needed"

Thanks,
Tariq.

Tariq Toukan (2):
   net/mlx4_en: Add TX_XDP for CQ types
   net/mlx4_en: Refactor the XDP forwarding rings scheme

  drivers/net/ethernet/mellanox/mlx4/en_cq.c  |  18 +-
  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  76 +++--
  drivers/net/ethernet/mellanox/mlx4/en_main.c|   2 +-
  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  | 378 ++--
  drivers/net/ethernet/mellanox/mlx4/en_port.c|   4 +-
  drivers/net/ethernet/mellanox/mlx4/en_rx.c  |   8 +-
  drivers/net/ethernet/mellanox/mlx4/en_tx.c  |   9 +-
  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|  18 +-
  8 files changed, 305 insertions(+), 208 deletions(-)

Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme

2016-10-30 Thread Tariq Toukan



On 28/10/2016 4:07 AM, Alexei Starovoitov wrote:

On Thu, Oct 27, 2016 at 05:52:04PM +0300, Tariq Toukan wrote:

Separately manage the two types of TX rings: regular ones, and XDP.
Upon an XDP set, do not borrow regular TX rings and convert them
into XDP ones, but allocate new ones, unless we hit the max number
of rings.
Which means that in systems with smaller #cores we will not consume
the current TX rings for XDP, while we are still in the num TX limit.

The commit log is too scarce for details...
So questions:
- Did you test with changing the number of channels after xdp prog is loaded?
That was the recent bug that Brenden fixed.
Bug no longer exists, as the indices of the XDP TX rings now start from 
0, each is identical to its respective RX ring.
Brenden's fix didn't get to net-next yet, and it shouldn't once the 
series is applied.

I need to take this w Dave.

- does it still have 256 tx queue limit or xdp tx rings can go over?

It still has the limit of 256 TX queues.

- Any performance implications ?

I didn't see any performance implications.
Note that the XDP TX rings are no longer shown in ethtool -S.


Brenden, could you please review this patch?


Regards,
Tariq

Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25

2016-10-30 Thread David Miller

From: Saeed Mahameed 
Date: Sun, 30 Oct 2016 11:59:57 +0200

> On Fri, Oct 28, 2016 at 7:53 PM, David Miller  wrote:
>>
>> I really disalike pull requests of this form.
>>
>> You add lots of datastructures and helper functions but no actual
>> users of these facilities to the driver.
>>
>> Do this instead:
>>
>> 1) Add TSAR infrastructure
>> 2) Add use of TSAR facilities to the driver
>>
>> That's one pull request.
>>
>> I don't care if this is hard, or if there are entanglements with
>> Infiniband or whatever, you must submit changes in this manner.
>>
> 
> It is not hard, it is just not right,  we have lots of IB and ETH
> features that we would like to submit in the same kernel cycle,
> with your suggestion I will have to almost submit every feature (core
> infrastructure and netdev/RDMA usage)
> to you and Doug.

Nobody can properly review an API addition without seeing how that
API is _USED_.

This is a simple fundamental fact.

And I'm not pulling in code that can't be reviewed properly.

Also, so many times people have added new junk to drivers and months
later never added the users of that new code and interfaces.

Forcing you to provide the use with the API addition makes sure that
it is absolutely impossible for that to happen.

Whatever issues you think prevent this are your issues, not mine.  I
want high quality submissions that can be properly reviewed, and you
have to find a way to satisfy that requirement.

Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-10-30 Thread John Heenan

Thanks for your reply.

The code was tested on a Cube i9 which has an internal rtl8723bu.

No other devices were tested.

I am happy to accept in an ideal context hard coding macpower is
undesirable, the comment is undesirable and it is wrong to assume the
issue is not unique to the rtl8723bu.

Your reply is idealistic. What can I do now?  I should of course have
factored out other untested devices in my patches. The apparent
concern you have with process over outcome is a useful lesson.

We are not in an ideal situation. The comment is of course relevant
and useful to starting a process to fixing a real bug I do not have
sufficient information to refine any further for and others do. In the
circumstances nothing really more can be expected.

My patch cover letter, [PATCH 0/2] provides evidence of a mess with
regard to determining macpower for the rtl8723bu and what is
subsequently required. This is important.

The kernel driver code is very poorly documented and there is not a
single source reference to device documentation. For example macpower
is noting more than a setting that is true or false according to
whether a read of a particular register return 0xef or not. Such value
was never obtained so a full init sequence was never performed.

It would be helpful if you could provide a link to device references.
As it is, how am I supposed to revise the patch without relevant
information?

My patch code works with the Cube i9, as is, despite a lack of
adequate information. Before it did not. That is a powerful statement

Have a nice day.

John Heenan

On 30 October 2016 at 22:00, Jes Sorensen  wrote:
> John Heenan  writes:
>> Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set
>> macpower, is never 0xea. It is only ever 0x01 (first time after modprobe)
>> using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results
>> occurs with 'Fix for authentication failure' [PATCH 1/2] in place.
>>
>> Whatever was returned, code tests always showed that at least
>> rtl8xxxu_init_queue_reserved_page(priv);
>> is always required. Not called if macpower set to true.
>>
>> Please see cover letter, [PATCH 0/2], for more information from tests.
>>
>
> Sorry but this patch is neither serious nor acceptable. First of all,
> hardcoding macpower like this right after an if statement is plain
> wrong, second your comments violate all kernel rules.
>
> Second, you argue this was tested using code test - on which device? Did
> you test it on all rtl8xxxu based devices or just rtl8723bu?
>
> NACK
>
> Jes

Re: net/dccp: warning in dccp_feat_clone_sp_val/__might_sleep

2016-10-30 Thread Eric Dumazet

On Sun, 2016-10-30 at 05:41 +0100, Andrey Konovalov wrote:
> Sorry, the warning is still there.
> 
> I'm not sure adding sched_annotate_sleep() does anything, since it's
> defined as (in case CONFIG_DEBUG_ATOMIC_SLEEP is not set):
> # define sched_annotate_sleep() do { } while (0)

Thanks again for testing.

But you do have CONFIG_DEBUG_ATOMIC_SLEEP set, which triggers a check in
__might_sleep() :

WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,

Relevant commit is 00845eb968ead28007338b2bb852b8beef816583
("sched: don't cause task state changes in nested sleep debugging")

Another relevant commit was 26cabd31259ba43f68026ce3f62b78094124333f
("sched, net: Clean up sk_wait_event() vs. might_sleep()") 

Before release_sock() could process the backlog in process context, only
lock_sock() could trigger the issue, so my fix at that time was commit
cb7cf8a33ff73cf638481d1edf883d8968f934f8 ("inet: Clean up
inet_csk_wait_for_connect() vs. might_sleep()")

I guess we need something else now, because the following :

static int dccp_wait_for_ccid(struct sock *sk, unsigned long delay)
{
DEFINE_WAIT(wait);
long remaining;

prepare_to_wait(sk_sleep(sk), , TASK_INTERRUPTIBLE);
sk->sk_write_pending++;
release_sock(sk);
...

can now process the socket backlog in process context from
release_sock(), so all GFP_KERNEL allocations might barf because of
TASK_INTERRUPTIBLE being used at that point.

sk_wait_event() probably also needs a fix.

Peter, any idea how this can be done ?

Thanks !

Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet

2016-10-30 Thread Jakub Sitnicki

On Fri, Oct 28, 2016 at 02:25 PM GMT, Tom Herbert wrote:
> On Fri, Oct 28, 2016 at 1:32 AM, Jakub Sitnicki  wrote:
>> On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote:
>>> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki  wrote:
 Same as for the transmit path, let's do our best to ensure that received
 ICMP errors that may be subject to forwarding will be routed the same
 path as flow that triggered the error, if it was going in the opposite
 direction.

>>> Unfortunately our ability to do this is generally quite limited. This
>>> patch will select the route for multipath, but I don't believe sets
>>> the same link in LAG and definitely can't help switches doing ECMP to
>>> route the ICMP packet in the same way as the flow would be. Did you
>>> see a problem that warrants solving this case?
>>
>> The motivation here is to bring IPv6 ECMP routing on par with IPv4 to
>> enable its wider use, targeting anycast services. Forwarding ICMP errors
>> back to the source host, at the L3 layer, is what we thought would be a
>> step forward.
>>
>> Similar to change in IPv4 routing introduced in commit 79a131592dbb
>> ("ipv4: ICMP packet inspection for multipath", [1]) we do our best at
>> L3, leaving any potential problems with LAG at lower layer (L2)
>> unaddressed.
>>
> ICMP will almost certainly take a different path in the network than
> TCP or UDP due to ECMP. If we ever get proper flow label support for
> ECMP then that could solve the problem if all the devices do a hash
> just on .

Sorry for my late reply, I have been traveling.

I think that either I am missing something here, or the proposed changes
address just the problem that you have described.

Yes, if we compute the hash that drives the route choice over the IP
header of the ICMP error, then there is no guarantee it will travel back
to the sender of the offending packet that triggered the error.

That is why, we look at the offending packet carried by an ICMP error
and hash over its fields, instead. We need, however, to take care of two
things:

1) swap the source with the destination address, because we are
   forwarding the ICMP error in the opposite direction than the
   offending packet was going (see icmpv6_multipath_hash() introduced in
   patch 4/5); and

2) ensure the flow labels used in both directions are the same (either
   reflected by one side, or fixed, e.g. not used and set to 0), so that
   the 4-tuple we hash over when forwarding, , is the same both ways, modulo the order of
   addresses.

> If this patch is being done to be compatible with IPv4 I guess that's
> okay, but it would be false advertisement to say this makes ICMP
> follow the same path as the flow being targeted in an error.
> Fortunately, I doubt anyone can have a dependency on this for ICMP.

I wouldn't want to propose anything that would be useless. If you think
that this is the case here, I would very much like to understand what
and why cannot work in practice.

Thanks for reviewing this series,
Jakub

Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-10-30 Thread Jes Sorensen

John Heenan  writes:
> Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set
> macpower, is never 0xea. It is only ever 0x01 (first time after modprobe)
> using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results
> occurs with 'Fix for authentication failure' [PATCH 1/2] in place.
>
> Whatever was returned, code tests always showed that at least
> rtl8xxxu_init_queue_reserved_page(priv);
> is always required. Not called if macpower set to true.
>
> Please see cover letter, [PATCH 0/2], for more information from tests.
>
> For rtl8xxxu-devel branch of 
> git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git
>
> Signed-off-by: John Heenan 
> ---
>  drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> index f25b4df..aae05f3 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
> @@ -3904,6 +3904,7 @@ static int rtl8xxxu_init_device(struct ieee80211_hw *hw)
>   macpower = false;
>   else
>   macpower = true;
> + macpower = false; // Code testing shows macpower must always be set to 
> false to avoid failure
>  
>   ret = fops->power_on(priv);
>   if (ret < 0) {

Sorry but this patch is neither serious nor acceptable. First of all,
hardcoding macpower like this right after an if statement is plain
wrong, second your comments violate all kernel rules.

Second, you argue this was tested using code test - on which device? Did
you test it on all rtl8xxxu based devices or just rtl8723bu?

NACK

Jes

[PATCH net-next 1/4] route: Set orig_output when redirecting to lwt on locally generated traffic

2016-10-30 Thread Thomas Graf

orig_output for IPv4 was only set for dsts which hit an input route.
Set it consistently for locally generated traffic as well to allow
lwt to continue the dst_output() path as configured by the nexthop.

Fixes: 2536862311d ("lwt: Add support to redirect dst.input")
Signed-off-by: Thomas Graf 
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90..7da886e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2138,8 +2138,10 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
}
 
rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0);
-   if (lwtunnel_output_redirect(rth->dst.lwtstate))
+   if (lwtunnel_output_redirect(rth->dst.lwtstate)) {
+   rth->dst.lwtstate->orig_output = rth->dst.output;
rth->dst.output = lwtunnel_output;
+   }
 
return rth;
 }
-- 
2.7.4

[PATCH net-next 2/4] route: Set lwtstate for local traffic and cached input dsts

2016-10-30 Thread Thomas Graf

A route on the output path hitting a RTN_LOCAL route will keep the dst
associated on its way through the loopback device. On the receive path,
the dst_input() call will thus invoke the input handler of the route
created in the output path. Thus, lwt redirection for input must be done
for dsts allocated in the otuput path as well.

Also, if a route is cached in the input path, the allocated dst should
respect lwtunnel configuration on the nexthop as well.

Signed-off-by: Thomas Graf 
---
 net/ipv4/route.c | 39 ++-
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7da886e..44f5403 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1596,6 +1596,19 @@ static void ip_del_fnhe(struct fib_nh *nh, __be32 daddr)
spin_unlock_bh(_lock);
 }
 
+static void set_lwt_redirect(struct rtable *rth)
+{
+   if (lwtunnel_output_redirect(rth->dst.lwtstate)) {
+   rth->dst.lwtstate->orig_output = rth->dst.output;
+   rth->dst.output = lwtunnel_output;
+   }
+
+   if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
+   rth->dst.lwtstate->orig_input = rth->dst.input;
+   rth->dst.input = lwtunnel_input;
+   }
+}
+
 /* called in rcu_read_lock() section */
 static int __mkroute_input(struct sk_buff *skb,
   const struct fib_result *res,
@@ -1685,14 +1698,7 @@ static int __mkroute_input(struct sk_buff *skb,
rth->dst.input = ip_forward;
 
rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag);
-   if (lwtunnel_output_redirect(rth->dst.lwtstate)) {
-   rth->dst.lwtstate->orig_output = rth->dst.output;
-   rth->dst.output = lwtunnel_output;
-   }
-   if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
-   rth->dst.lwtstate->orig_input = rth->dst.input;
-   rth->dst.input = lwtunnel_input;
-   }
+   set_lwt_redirect(rth);
skb_dst_set(skb, >dst);
 out:
err = 0;
@@ -1919,8 +1925,18 @@ out: return err;
rth->dst.error= -err;
rth->rt_flags   &= ~RTCF_LOCAL;
}
+
if (do_cache) {
-   if (unlikely(!rt_cache_route(_RES_NH(res), rth))) {
+   struct fib_nh *nh = _RES_NH(res);
+
+   rth->dst.lwtstate = lwtstate_get(nh->nh_lwtstate);
+   if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
+   WARN_ON(rth->dst.input == lwtunnel_input);
+   rth->dst.lwtstate->orig_input = rth->dst.input;
+   rth->dst.input = lwtunnel_input;
+   }
+
+   if (unlikely(!rt_cache_route(nh, rth))) {
rth->dst.flags |= DST_NOCACHE;
rt_add_uncached_list(rth);
}
@@ -2138,10 +2154,7 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
}
 
rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0);
-   if (lwtunnel_output_redirect(rth->dst.lwtstate)) {
-   rth->dst.lwtstate->orig_output = rth->dst.output;
-   rth->dst.output = lwtunnel_output;
-   }
+   set_lwt_redirect(rth);
 
return rth;
 }
-- 
2.7.4

[PATCH net-next 4/4] bpf: Add samples for LWT-BPF

2016-10-30 Thread Thomas Graf

This adds a set of samples demonstrating the use of lwt-bpf combined
with a shell script which allows running the samples in the form of
a basic selftest.

The samples include:
 - Allowing all packets
 - Dropping all packets
 - Printing context information
 - Access packet data
 - IPv4 daddr rewrite in dst_output()
 - L2 MAC header push + redirect in lwt xmit

Signed-off-by: Thomas Graf 
---
 samples/bpf/bpf_helpers.h   |   4 +
 samples/bpf/lwt_bpf.c   | 210 +++
 samples/bpf/test_lwt_bpf.sh | 337 
 3 files changed, 551 insertions(+)
 create mode 100644 samples/bpf/lwt_bpf.c
 create mode 100755 samples/bpf/test_lwt_bpf.sh

diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 90f44bd..f34e417 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -80,6 +80,8 @@ struct bpf_map_def {
unsigned int map_flags;
 };
 
+static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) =
+   (void *) BPF_FUNC_skb_load_bytes;
 static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int 
flags) =
(void *) BPF_FUNC_skb_store_bytes;
 static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int 
flags) =
@@ -88,6 +90,8 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int 
from, int to, int flag
(void *) BPF_FUNC_l4_csum_replace;
 static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
(void *) BPF_FUNC_skb_under_cgroup;
+static int (*bpf_skb_push)(void *, int len, int flags) =
+   (void *) BPF_FUNC_skb_push;
 
 #if defined(__x86_64__)
 
diff --git a/samples/bpf/lwt_bpf.c b/samples/bpf/lwt_bpf.c
new file mode 100644
index 000..05be6ac
--- /dev/null
+++ b/samples/bpf/lwt_bpf.c
@@ -0,0 +1,210 @@
+/* Copyright (c) 2016 Thomas Graf 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include 
+
+# define printk(fmt, ...)  \
+   ({  \
+   char fmt[] = fmt;   \
+   bpf_trace_printk(fmt, sizeof(fmt),  \
+##__VA_ARGS__);\
+   })
+
+#define CB_MAGIC 1234
+
+/* Let all packets pass */
+SEC("nop")
+int do_nop(struct __sk_buff *skb)
+{
+   return BPF_OK;
+}
+
+/* Print some context information per packet to tracing buffer.
+ */
+SEC("ctx_test")
+int do_ctx_test(struct __sk_buff *skb)
+{
+   skb->cb[0] = CB_MAGIC;
+   printk("len %d hash %d protocol %d\n", skb->len, skb->hash,
+  skb->protocol);
+   printk("cb %d ingress_ifindex %d ifindex %d\n", skb->cb[0],
+  skb->ingress_ifindex, skb->ifindex);
+
+   return BPF_OK;
+}
+
+/* Print content of skb->cb[] to tracing buffer */
+SEC("print_cb")
+int do_print_cb(struct __sk_buff *skb)
+{
+   printk("cb0: %x cb1: %x cb2: %x\n", skb->cb[0], skb->cb[1],
+  skb->cb[2]);
+   printk("cb3: %x cb4: %x\n", skb->cb[3], skb->cb[4]);
+
+   return BPF_OK;
+}
+
+/* Print source and destination IPv4 address to tracing buffer */
+SEC("data_test")
+int do_data_test(struct __sk_buff *skb)
+{
+   void *data = (void *)(long)skb->data;
+   void *data_end = (void *)(long)skb->data_end;
+   struct iphdr *iph = data;
+
+   if (data + sizeof(*iph) > data_end) {
+   printk("packet truncated\n");
+   return BPF_DROP;
+   }
+
+   printk("src: %x dst: %x\n", iph->saddr, iph->daddr);
+
+   return BPF_OK;
+}
+
+#define IP_CSUM_OFF offsetof(struct iphdr, check)
+#define IP_DST_OFF offsetof(struct iphdr, daddr)
+#define IP_SRC_OFF offsetof(struct iphdr, saddr)
+#define IP_PROTO_OFF offsetof(struct iphdr, protocol)
+#define TCP_CSUM_OFF offsetof(struct tcphdr, check)
+#define UDP_CSUM_OFF offsetof(struct udphdr, check)
+#define IS_PSEUDO 0x10
+
+static inline int rewrite(struct __sk_buff *skb, uint32_t old_ip,
+ uint32_t new_ip, int rw_daddr)
+{
+   int ret, off = 0, flags = IS_PSEUDO;
+   uint8_t proto;
+
+   ret = bpf_skb_load_bytes(skb, IP_PROTO_OFF, , 1);
+   if (ret < 0) {
+   printk("bpf_l4_csum_replace failed: %d\n", ret);
+   return BPF_DROP;
+   }
+
+   switch (proto) {
+   case IPPROTO_TCP:
+

[PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation

2016-10-30 Thread Thomas Graf

Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and
BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a
LWT redirection of type LWTUNNEL_ENCAP_BPF.

The separate program types are required because manipulation of
packet data is only allowed on the output and transmit path as
the subsequent dst_input() call path assumes an IP header
validated by ip_rcv(). The BPF programs will be handed an skb
with the L3 header attached and may return one of the following
return codes:

 BPF_OK - Continue routing as per nexthop
 BPF_DROP - Drop skb and return EPERM
 BPF_REDIRECT - Redirect skb to device as per redirect() helper.
(Only valid on lwtunnel_xmit() hook)

The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.

A new helper bpf_skb_push() is added which allows to preprend an
L2 header in front of the skb, extend the existing L3 header, or
both. This allows to address a wide range of issues:
 - Optimize L2 header construction when L2 information is always
   static to avoid ARP/NDisc lookup.
 - Extend IP header to add additional IP options.
 - Perform simple encapsulation where offload is of no concern.
   (The existing funtionality to attach a tunnel key to the skb
and redirect to a tunnel net_device to allow for offload
continues to work obviously).

Signed-off-by: Thomas Graf 
---
 include/linux/filter.h|   2 +-
 include/uapi/linux/bpf.h  |  31 +++-
 include/uapi/linux/lwtunnel.h |  21 +++
 kernel/bpf/verifier.c |  16 +-
 net/core/Makefile |   2 +-
 net/core/filter.c | 148 -
 net/core/lwt_bpf.c| 365 ++
 net/core/lwtunnel.c   |   1 +
 8 files changed, 579 insertions(+), 7 deletions(-)
 create mode 100644 net/core/lwt_bpf.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..aad7f81 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -438,7 +438,7 @@ struct xdp_buff {
 };
 
 /* compute the linear packet data range [data, data_end) which
- * will be accessed by cls_bpf and act_bpf programs
+ * will be accessed by cls_bpf, act_bpf and lwt programs
  */
 static inline void bpf_compute_data_end(struct sk_buff *skb)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e2f38e0..2ebaa3c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -96,6 +96,9 @@ enum bpf_prog_type {
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
BPF_PROG_TYPE_PERF_EVENT,
+   BPF_PROG_TYPE_LWT_IN,
+   BPF_PROG_TYPE_LWT_OUT,
+   BPF_PROG_TYPE_LWT_XMIT,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
@@ -383,6 +386,16 @@ union bpf_attr {
  *
  * int bpf_get_numa_node_id()
  * Return: Id of current NUMA node.
+ *
+ * int bpf_skb_push()
+ * Add room to beginning of skb and adjusts MAC header offset accordingly.
+ * Extends/reallocaes for needed skb headeroom automatically.
+ * May change skb data pointer and will thus invalidate any check done
+ * for direct packet access.
+ * @skb: pointer to skb
+ * @len: length of header to be pushed in front
+ * @flags: Flags (unused for now)
+ * Return: 0 on success or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -427,7 +440,8 @@ union bpf_attr {
FN(skb_pull_data),  \
FN(csum_update),\
FN(set_hash_invalid),   \
-   FN(get_numa_node_id),
+   FN(get_numa_node_id),   \
+   FN(skb_push),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -511,6 +525,21 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* Generic BPF return codes which all BPF program types may support.
+ * The values are binary compatible with their TC_ACT_* counter-part to
+ * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
+ * programs.
+ *
+ * XDP is handled seprately, see XDP_*.
+ */
+enum bpf_ret_code {
+   BPF_OK = 0,
+   /* 1 reserved */
+   BPF_DROP = 2,
+   /* 3-6 reserved */
+   BPF_REDIRECT = 7,
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..9354d997 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
LWTUNNEL_ENCAP_IP6,
+   LWTUNNEL_ENCAP_BPF,
__LWTUNNEL_ENCAP_MAX,
 };
 
@@ -42,4 +43,24 @@ enum lwtunnel_ip6_t {
 
 #define LWTUNNEL_IP6_MAX (__LWTUNNEL_IP6_MAX - 1)
 
+enum {
+   LWT_BPF_PROG_UNSPEC,
+   LWT_BPF_PROG_FD,
+

[PATCH net-next 0/4] BPF for lightweight tunnel encapsulation

2016-10-30 Thread Thomas Graf

This series implements BPF program invocation from dst entries via the
lightweight tunnels infrastructure. The BPF program can be attached to
lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and sees an L3
skb as context. input is read-only, output can write, xmit can write,
push headers, and redirect.

Motiviation for this work:
 - Restricting outgoing routes beyond what the route tuple supports
 - Per route accounting byond realms
 - Fast attachment of L2 headers where header does not require resolving
   L2 addresses
 - ILA like uses cases where L3 addresses are resolved and then routed
   in an async manner
 - Fast encapsulation + redirect. For now limited to use cases where not 
   setting inner and outer offset/protocol is OK.

A couple of samples on how to use it can be found in patch 04.

Thomas Graf (4):
  route: Set orig_output when redirecting to lwt on locally generated
traffic
  route: Set lwtstate for local traffic and cached input dsts
  bpf: BPF for lightweight tunnel encapsulation
  bpf: Add samples for LWT-BPF

 include/linux/filter.h|   2 +-
 include/uapi/linux/bpf.h  |  31 +++-
 include/uapi/linux/lwtunnel.h |  21 +++
 kernel/bpf/verifier.c |  16 +-
 net/core/Makefile |   2 +-
 net/core/filter.c | 148 -
 net/core/lwt_bpf.c| 365 ++
 net/core/lwtunnel.c   |   1 +
 net/ipv4/route.c  |  37 +++--
 samples/bpf/bpf_helpers.h |   4 +
 samples/bpf/lwt_bpf.c | 210 
 samples/bpf/test_lwt_bpf.sh   | 337 ++
 12 files changed, 1156 insertions(+), 18 deletions(-)
 create mode 100644 net/core/lwt_bpf.c
 create mode 100644 samples/bpf/lwt_bpf.c
 create mode 100755 samples/bpf/test_lwt_bpf.sh

-- 
2.7.4

RFC if==else in halbtc8723b1ant.c

2016-10-30 Thread Nicholas Mc Guire


Hi !

 in your commit f5b586909581 ("rtlwifi: btcoexist: Modify driver to support 
 BT coexistence in rtl8723be") you introduced a if/else where both branches 
 are the same but the comment in the else branch suggests that this might be
 unintended.

 from code review only I can´t say what the intent is.

/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b1ant.c:halbtc8723b1ant_action_wifi_connected_bt_acl_busy()

1838 if ((bt_rssi_state == BTC_RSSI_STATE_HIGH) ||
1839 (bt_rssi_state == BTC_RSSI_STATE_STAY_HIGH)) {
1840 halbtc8723b1ant_ps_tdma(btcoexist, NORMAL_EXEC,
1841 true, 14);
1842 coex_dm->auto_tdma_adjust = false;
1843 } else { /*for low BT RSSI*/
1844 halbtc8723b1ant_ps_tdma(btcoexist, NORMAL_EXEC,
1845 true, 14);
1846 coex_dm->auto_tdma_adjust = false;
1847 }

  basically the same construct is also in 
 halbtc8723b1ant_run_coexist_mechanism()

2213 if ((wifi_rssi_state == BTC_RSSI_STATE_HIGH) ||
2214 (wifi_rssi_state == BTC_RSSI_STATE_STAY_HIGH)) 
{
2215 halbtc8723b1ant_limited_tx(btcoexist,
2216NORMAL_EXEC,
22171, 1, 1, 1);
2218 } else {
2219 halbtc8723b1ant_limited_tx(btcoexist,
2220NORMAL_EXEC,
22211, 1, 1, 1);
 }

 where the if condition is the same so the else may also only apply to the
 low BT RSSI - and the if and else are again the same - if this is intended
 or not is not clear. If this is intended it should have appropriate comments.

thx!
hofrat

Re: [bnx2] [Regression 4.8] Driver loading fails without firmware

2016-10-30 Thread Paul Menzel

Dear Baoquan,


Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He:
> On 10/27/16 at 03:21pm, Paul Menzel wrote:

> > > > Baoquan, could you please fix this regression. My suggestion is, that 
> > > > you
> > > > add the old code back, but check if the firmware has been loaded. If it
> > > > hasn’t, load it again.
> > > > 
> > > > That way, people can update their Linux kernel, and it continues working
> > > > without changing the initramfs, or anything else.
> > > 
> > > I saw your mail but I am also not familiar with bnx2 driver. As the
> > > commit log says I just tried to make bnx2 driver reset itself earlier.
> > > 
> > > So you did a git bisect and found this commit caused the regression,
> > > right? If yes, and network developers have no action, I will look into
> > > the code and see if I have idea to fix it.
> > 
> > Well, I looked through the commits and found that one, which would explain
> > the changed behavior.
> > 
> > To be sure, and to follow your request, I took Linux 4.8.4 and reverted your
> > commit (attached). Then I deleted the firmware again from the initramfs, and
> > rebooted. The devices showed up just fine as before.
> > 
> > So to summarize, the commit is indeed the culprit.

> Sorry for this.
> 
> Could you tell the steps to reproduce? I will find a machine with bnx2
> NIC and check if there's other ways.

Well, delete the bnx2 firmware files from the initramfs, and start the
system.

Did you read my proposal, to try to load the firmware twice, that means,
basically revert only the deleted lines of your commit, and add an
additional check?


Kind regards,

Paul

Re: Let's do P4

2016-10-30 Thread Thomas Graf

On 10/30/16 at 08:44am, Jiri Pirko wrote:
> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:
> >On 16-10-29 07:49 AM, Jakub Kicinski wrote:
> >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote:
> >>> Hi all.
> >>>
> >>> The network world is divided into 2 general types of hw:
> >>> 1) network ASICs - network specific silicon, containing things like TCAM
> >>>These ASICs are suitable to be programmed by P4.
> >>> 2) network processors - basically a general purpose CPUs
> >>>These processors are suitable to be programmed by eBPF.
> >>>
> >>> I believe that by now, the most people came to a conclusion that it is
> >>> very difficult to handle both types by either P4 or eBPF. And since
> >>> eBPF is part of the kernel, I would like to introduce P4 into kernel
> >>> as well. Here's a plan:
> >>>
> >>> 1) Define P4 intermediate representation
> >>>I cannot imagine loading P4 program (c-like syntax text file) into
> >>>kernel as is. That means that as the first step, we need find some
> >>>intermediate representation. I can imagine someting in a form of AST,
> >>>call it "p4ast". I don't really know how to do this exactly though,
> >>>it's just an idea.
> >>>
> >>>In the end there would be a userspace precompiler for this:
> >>>$ makep4ast example.p4 example.ast
> >> 
> >> Maybe stating the obvious, but IMHO defining the IR is the hardest part.
> >> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF.  The
> >> AST/IR for switch pipelines should allow for similar flexibility.
> >> Looser coupling would also protect us from changes in spec of the high
> >> level language.

My assumption was that a new IR is defined which is easier to parse than
eBPF which is targeted at execution on a CPU and not indented for pattern
matching. Just looking at how llvm creates different patterns and reorders
instructions, I'm not seeing how eBPF can serve as a general purpose IR
if the objective is to allow fairly flexible generation of the bytecode.
Hence the alternative IR serving as additional metadata complementing the
eBPF program.

> >Jumping in the middle here. You managed to get an entire thread going
> >before I even woke up :)
> >
> >The problem with eBPF as an IR is that in the universe of eBPF IR
> >programs the subset that can be offloaded onto a standard ASIC based
> >hardware (non NPU/FPGA/etc) is so small to be almost meaningless IMO.
> >
> >I tried this for awhile and the result is users have to write very
> >targeted eBPF that they "know" will be pattern matched and pushed into
> >an ASIC. It can work but its very fragile. When I did this I ended up
> >with an eBPF generator for deviceX and an eBPF generator for deviceY
> >each with a very specific pattern matching engine in the driver to
> >xlate ebpf-deviceX into its asic. Existing ASICs for example usually
> >support only one pipeline, only one parser (or require moving mountains
> >to change the parse via ucode), only one set of tables, and only one
> >deparser/serailizer at the end to build the new packet. Next-gen pieces
> >may have some flexibility on the parser side.
> >
> >There is an interesting resource allocation problem we have that could
> >be solved by p4 or devlink where in we want to pre-allocate slices of
> >the TCAM for certain match types. I was planning on writing devlink code
> >for this because its primarily done at initialization once.
> 
> There are 2 resource allocation problems in our hw. One is general
> division ot the resources in feature-chunks. That needs to be done
> during the ASIC initialization phase. For that, I also plan to utilize
> devlink API.
> 
> The second one is runtime allocation of tables, and that would be
> handled by p4 just fine.
> 
> 
> >
> >I will note one nice thing about using eBPF however is that you have an
> >easy software emulation path via ebpf engine in kernel.
> >
> >... And merging threads here with Jiri's email ...
> >
> >> If you do p4>ebpf in userspace, you have 2 apis:
> >> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel
> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel
> >> 
> >> Those are 2 apis. Both wrapped up by TC, but still 2 apis.
> >> 
> >> What I believe is correct is to have one api:
> >> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel
> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel

I understand what you mean with two APIs now. You want a single IR
block and divide the SW/HW part in the kernel rather than let llvm or
something else do it.

> >Couple comments around this, first adding yet another IR in the kernel
> >and another JIT engine to map that IR on to eBPF or hardware vendor X
> >doesn't get me excited. Its really much easier to write these as backend
> >objects in LLVM. Not saying it can't be done just saying it is easier
> >in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF.
> >In the end this would be a

[PATCH 1/2] rtl8xxxu: Fix for authentication failure

2016-10-30 Thread John Heenan

This fix enables the same sequence of init behaviour as the alternative
working driver for the wireless rtl8723bu IC at
https://github.com/lwfinger/rtl8723bu

For exampe rtl8xxxu_init_device is now called each time
userspace wpa_supplicant is executed instead of just once when
modprobe is executed.

Along with 'Fix for bogus data used to determine macpower',
wpa_supplicant now reliably and successfully authenticates.

For rtl8xxxu-devel branch of 
git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git

Signed-off-by: John Heenan 
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index 04141e5..f25b4df 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -5779,6 +5779,11 @@ static int rtl8xxxu_start(struct ieee80211_hw *hw)
 
ret = 0;
 
+   ret = rtl8xxxu_init_device(hw);
+   if (ret)
+   goto error_out;
+
+
init_usb_anchor(>rx_anchor);
init_usb_anchor(>tx_anchor);
init_usb_anchor(>int_anchor);
@@ -6080,10 +6085,6 @@ static int rtl8xxxu_probe(struct usb_interface 
*interface,
goto exit;
}
 
-   ret = rtl8xxxu_init_device(hw);
-   if (ret)
-   goto exit;
-
hw->wiphy->max_scan_ssids = 1;
hw->wiphy->max_scan_ie_len = IEEE80211_MAX_DATA_LEN;
hw->wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION);
-- 
2.10.1

[PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-10-30 Thread John Heenan

Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set
macpower, is never 0xea. It is only ever 0x01 (first time after modprobe)
using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results
occurs with 'Fix for authentication failure' [PATCH 1/2] in place.

Whatever was returned, code tests always showed that at least
rtl8xxxu_init_queue_reserved_page(priv);
is always required. Not called if macpower set to true.

Please see cover letter, [PATCH 0/2], for more information from tests.

For rtl8xxxu-devel branch of 
git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git

Signed-off-by: John Heenan 
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index f25b4df..aae05f3 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -3904,6 +3904,7 @@ static int rtl8xxxu_init_device(struct ieee80211_hw *hw)
macpower = false;
else
macpower = true;
+   macpower = false; // Code testing shows macpower must always be set to 
false to avoid failure
 
ret = fops->power_on(priv);
if (ret < 0) {
-- 
2.10.1

[PATCH 0/2] rtl8xxxu: Fix allows wpa_supplicant to authenticate

2016-10-30 Thread John Heenan

With the current kernel release, wpa_supplicant results in authentication 
failure
with a Cube i9 tablet (a Surface Pro like device):

Successfully initialized wpa_supplicant
wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 
(SSID='localre' freq=2417 MHz)
wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 
(SSID='localre' freq=2417 MHz)
wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 
(SSID='localre' freq=2417 MHz)
wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 
(SSID='localre' freq=2417 MHz)
wlp0s20f0u7i2: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="localre" 
auth_failures=1 duration=10 reason=CONN_FAILED

There is a workaround: that ONLY works once per invocation of wpa_supplicant: 
rmmod rtl8xxxu
modprobe rtl8xxxu

The follwing two patches result in reliable behaviour, without a workaround,
of wpa_supplicant:

Successfully initialized wpa_supplicant
wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 
(SSID='localre' freq=2417 MHz)
wlp0s20f0u7i2: Trying to associate with 10:fe:ed:62:7a:78 (SSID='localre' 
freq=2417 MHz)
wlp0s20f0u7i2: Associated with 10:fe:ed:62:7a:78
wlp0s20f0u7i2: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
wlp0s20f0u7i2: WPA: Key negotiation completed with 10:fe:ed:62:7a:78 [PTK=CCMP 
GTK=CCMP]
wlp0s20f0u7i2: CTRL-EVENT-CONNECTED - Connection to 10:fe:ed:62:7a:78 completed 
[id=0 id_str=]


The patches are for kernel tree:
git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git
branch: rtl8xxxu-devel

The first patch moves init code so that each time wpa_supplicant is invoked
there is similar init behaviour as the alternative working rtl8xxxu driver
https://github.com/lwfinger/rtl8723bu

The second patch enables more complete initialisation to occur. There are three 
issues:
1. The value returned by "rtl8xxxu_read8(priv, REG_CR);", to set macpower, is 
never 0xef. 
   The value is either 0x01 (first time with wpa_supplcant after modprobe) or 
0x00 
   (re executing wpa_supplicant)

2. Trying to use the value 0x00 or 0x01 retutned to determine macpower setting 
always 
   resulted in failure

3. At the very least 'rtl8xxxu_init_queue_reserved_page(priv);' must always 
   be invoked, even if not all of the extra init sequence arising from setting
   macpower to false is run.


Patched code with a suitable Makefile will be available from
https://github.com/johnheenan/rtl8xxxu for testing by Cube i9 owners


John Heenan (2):
  rtl8xxxu: Fix for authentication failure
  rtl8xxxu: Fix for bogus data used to determine macpower

 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

-- 
2.10.1

Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25

2016-10-30 Thread Saeed Mahameed

On Fri, Oct 28, 2016 at 7:53 PM, David Miller  wrote:
>
> I really disalike pull requests of this form.
>
> You add lots of datastructures and helper functions but no actual
> users of these facilities to the driver.
>
> Do this instead:
>
> 1) Add TSAR infrastructure
> 2) Add use of TSAR facilities to the driver
>
> That's one pull request.
>
> I don't care if this is hard, or if there are entanglements with
> Infiniband or whatever, you must submit changes in this manner.
>

It is not hard, it is just not right,  we have lots of IB and ETH
features that we would like to submit in the same kernel cycle,
with your suggestion I will have to almost submit every feature (core
infrastructure and netdev/RDMA usage)
to you and Doug.  Same for rdma features,  you will receive PULL
request for them as well,
I am sure you and the netdev list don't need such noise.  do not
forget that this will slow down mlx5 progress since
netde will block rdma and vise-versa.

> I will not accept additions to a driver that don't even get really
> used.

For logic/helper functions containing patches such as "Add TSAR
infrastructure" I agree and i can find a way to move some code around
to
avoid future conflicts and remove them from such pull requests.

but you need to at least accept hardware related structures
infrastructure patches for shared code such as
include/linux/mlx5/mlx5_ifc.h where we have only hardware definitions
and those patches are really minimal.

So bottom line, I will do my best to ensure future PULL requests will
contain only include/linux/mlx5/*.h hardware related definitions
or fully implemented features.

Can we agree on that ?

Thanks,
Saeed.

pull-request: wireless-drivers-next 2016-10-30

2016-10-30 Thread Kalle Valo

Hi Dave,

few fixes for 4.9. I tagged this on the plane over a slow mosh
connection while travelling to Plumbers so I might have done something
wrong, please check more carefully than usually. For example I had to
redo the signed tag because of some whitespace damage.

Please let me know if there are any problems.

Kalle

The following changes since commit 67f0160fe34ec5391a428603b9832c9f99d8f3a1:

  MAINTAINERS: Update qlogic networking drivers (2016-10-26 23:29:12 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2016-10-30

for you to fetch changes up to d3532ea6ce4ea501e421d130555e59edc2945f99:

  brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap 
(2016-10-27 18:04:54 +0300)


wireless-drivers fixes for 4.9

iwlwifi

* some fixes for suspend/resume with unified FW images
* a fix for a false-positive lockdep report
* a fix for multi-queue that caused an unnecessary 1 second latency
* a fix for an ACPI parsing bug that caused a misleading error message

brcmfmac

* fix a variable uninitialised warning in brcmf_cfg80211_start_ap()


Arnd Bergmann (1):
  brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap

Haim Dreyfuss (1):
  iwlwifi: mvm: comply with fw_restart mod param on suspend

Johannes Berg (1):
  iwlwifi: pcie: mark command queue lock with separate lockdep class

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2015-10-25' of 
git://git.kernel.org/.../iwlwifi/iwlwifi-fixes

Luca Coelho (4):
  iwlwifi: mvm: use ssize_t for len in iwl_debugfs_mem_read()
  iwlwifi: mvm: fix d3_test with unified D0/D3 images
  iwlwifi: pcie: fix SPLC structure parsing
  iwlwifi: mvm: fix netdetect starting/stopping for unified images

Sara Sharon (1):
  iwlwifi: mvm: wake the wait queue when the RX sync counter is zero

 .../broadcom/brcm80211/brcmfmac/cfg80211.c |2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/d3.c|   49 +---
 drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c   |4 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c  |3 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mvm.h   |1 +
 drivers/net/wireless/intel/iwlwifi/mvm/ops.c   |1 +
 drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c  |3 +-
 drivers/net/wireless/intel/iwlwifi/mvm/scan.c  |   33 ++--
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c  |   79 
 drivers/net/wireless/intel/iwlwifi/pcie/tx.c   |8 ++
 10 files changed, 129 insertions(+), 54 deletions(-)

[patch net] mlxsw: spectrum: Fix incorrect reuse of MID entries

2016-10-30 Thread Jiri Pirko

From: Ido Schimmel 

In the device, a MID entry represents a group of local ports, which can
later be bound to a MDB entry.

The lookup of an existing MID entry is currently done using the provided
MC MAC address and VID, from the Linux bridge. However, this can result
in an incorrect reuse of the same MID index in different VLAN-unaware
bridges (same IP MC group and VID 0).

Fix this by performing the lookup based on FID instead of VID, which is
unique across different bridges.

Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel 
Acked-by: Elad Raz 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 9b22863..97bbc1d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -115,7 +115,7 @@ struct mlxsw_sp_rif {
 struct mlxsw_sp_mid {
struct list_head list;
unsigned char addr[ETH_ALEN];
-   u16 vid;
+   u16 fid;
u16 mid;
unsigned int ref_count;
 };
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 5e00c79..1e2c8ec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -929,12 +929,12 @@ static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 mid,
 
 static struct mlxsw_sp_mid *__mlxsw_sp_mc_get(struct mlxsw_sp *mlxsw_sp,
  const unsigned char *addr,
- u16 vid)
+ u16 fid)
 {
struct mlxsw_sp_mid *mid;
 
list_for_each_entry(mid, _sp->br_mids.list, list) {
-   if (ether_addr_equal(mid->addr, addr) && mid->vid == vid)
+   if (ether_addr_equal(mid->addr, addr) && mid->fid == fid)
return mid;
}
return NULL;
@@ -942,7 +942,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_get(struct 
mlxsw_sp *mlxsw_sp,
 
 static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp,
const unsigned char *addr,
-   u16 vid)
+   u16 fid)
 {
struct mlxsw_sp_mid *mid;
u16 mid_idx;
@@ -958,7 +958,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
 
set_bit(mid_idx, mlxsw_sp->br_mids.mapped);
ether_addr_copy(mid->addr, addr);
-   mid->vid = vid;
+   mid->fid = fid;
mid->mid = mid_idx;
mid->ref_count = 0;
list_add_tail(>list, _sp->br_mids.list);
@@ -991,9 +991,9 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (switchdev_trans_ph_prepare(trans))
return 0;
 
-   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, mdb->vid);
+   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, fid);
if (!mid) {
-   mid = __mlxsw_sp_mc_alloc(mlxsw_sp, mdb->addr, mdb->vid);
+   mid = __mlxsw_sp_mc_alloc(mlxsw_sp, mdb->addr, fid);
if (!mid) {
netdev_err(dev, "Unable to allocate MC group\n");
return -ENOMEM;
@@ -1137,7 +1137,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
u16 mid_idx;
int err = 0;
 
-   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, mdb->vid);
+   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, fid);
if (!mid) {
netdev_err(dev, "Unable to remove port from MC DB\n");
return -EINVAL;
-- 
2.5.5

[PATCH net] qede: Fix statistics' strings for Tx/Rx queues

2016-10-30 Thread Yuval Mintz

When an interface is configured to use Tx/Rx-only queues,
the length of the statistics would be shortened to accomodate only the
statistics required per-each queue, and the values would be provided
accordingly.
However, the strings provided would still contain both Tx and Rx strings
for each one of the queues [regardless of its configuration], which might
lead to out-of-bound access when filling the buffers as well as incorrect
statistics presented.

Fixes: 9a4d7e86acf3 ("qede: Add support for Tx/Rx-only queues.")
Signed-off-by: Yuval Mintz 
---
Hi Dave,

Please consider applying this to `net'.

Thanks,
Yuval
---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index d230742..8c2bbb2 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -177,16 +177,23 @@ static void qede_get_strings_stats(struct qede_dev *edev, 
u8 *buf)
for (i = 0, k = 0; i < QEDE_QUEUE_CNT(edev); i++) {
int tc;
 
-   for (j = 0; j < QEDE_NUM_RQSTATS; j++)
-   sprintf(buf + (k + j) * ETH_GSTRING_LEN,
-   "%d:   %s", i, qede_rqstats_arr[j].string);
-   k += QEDE_NUM_RQSTATS;
-   for (tc = 0; tc < edev->num_tc; tc++) {
-   for (j = 0; j < QEDE_NUM_TQSTATS; j++)
+   if (edev->fp_array[i].type & QEDE_FASTPATH_RX) {
+   for (j = 0; j < QEDE_NUM_RQSTATS; j++)
sprintf(buf + (k + j) * ETH_GSTRING_LEN,
-   "%d.%d: %s", i, tc,
-   qede_tqstats_arr[j].string);
-   k += QEDE_NUM_TQSTATS;
+   "%d:   %s", i,
+   qede_rqstats_arr[j].string);
+   k += QEDE_NUM_RQSTATS;
+   }
+
+   if (edev->fp_array[i].type & QEDE_FASTPATH_TX) {
+   for (tc = 0; tc < edev->num_tc; tc++) {
+   for (j = 0; j < QEDE_NUM_TQSTATS; j++)
+   sprintf(buf + (k + j) *
+   ETH_GSTRING_LEN,
+   "%d.%d: %s", i, tc,
+   qede_tqstats_arr[j].string);
+   k += QEDE_NUM_TQSTATS;
+   }
}
}
 
-- 
1.9.3

Re: Let's do P4

2016-10-30 Thread Jiri Pirko

Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote:
>On 16-10-29 07:49 AM, Jakub Kicinski wrote:
>> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote:
>>> Hi all.
>>>
>>> The network world is divided into 2 general types of hw:
>>> 1) network ASICs - network specific silicon, containing things like TCAM
>>>These ASICs are suitable to be programmed by P4.
>>> 2) network processors - basically a general purpose CPUs
>>>These processors are suitable to be programmed by eBPF.
>>>
>>> I believe that by now, the most people came to a conclusion that it is
>>> very difficult to handle both types by either P4 or eBPF. And since
>>> eBPF is part of the kernel, I would like to introduce P4 into kernel
>>> as well. Here's a plan:
>>>
>>> 1) Define P4 intermediate representation
>>>I cannot imagine loading P4 program (c-like syntax text file) into
>>>kernel as is. That means that as the first step, we need find some
>>>intermediate representation. I can imagine someting in a form of AST,
>>>call it "p4ast". I don't really know how to do this exactly though,
>>>it's just an idea.
>>>
>>>In the end there would be a userspace precompiler for this:
>>>$ makep4ast example.p4 example.ast
>> 
>> Maybe stating the obvious, but IMHO defining the IR is the hardest part.
>> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF.  The
>> AST/IR for switch pipelines should allow for similar flexibility.
>> Looser coupling would also protect us from changes in spec of the high
>> level language.
>> 
>
>Jumping in the middle here. You managed to get an entire thread going
>before I even woke up :)
>
>The problem with eBPF as an IR is that in the universe of eBPF IR
>programs the subset that can be offloaded onto a standard ASIC based
>hardware (non NPU/FPGA/etc) is so small to be almost meaningless IMO.
>
>I tried this for awhile and the result is users have to write very
>targeted eBPF that they "know" will be pattern matched and pushed into
>an ASIC. It can work but its very fragile. When I did this I ended up
>with an eBPF generator for deviceX and an eBPF generator for deviceY
>each with a very specific pattern matching engine in the driver to
>xlate ebpf-deviceX into its asic. Existing ASICs for example usually
>support only one pipeline, only one parser (or require moving mountains
>to change the parse via ucode), only one set of tables, and only one
>deparser/serailizer at the end to build the new packet. Next-gen pieces
>may have some flexibility on the parser side.
>
>There is an interesting resource allocation problem we have that could
>be solved by p4 or devlink where in we want to pre-allocate slices of
>the TCAM for certain match types. I was planning on writing devlink code
>for this because its primarily done at initialization once.

There are 2 resource allocation problems in our hw. One is general
division ot the resources in feature-chunks. That needs to be done
during the ASIC initialization phase. For that, I also plan to utilize
devlink API.

The second one is runtime allocation of tables, and that would be
handled by p4 just fine.


>
>I will note one nice thing about using eBPF however is that you have an
>easy software emulation path via ebpf engine in kernel.
>
>... And merging threads here with Jiri's email ...
>
>> If you do p4>ebpf in userspace, you have 2 apis:
>> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel
>> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>> 
>> Those are 2 apis. Both wrapped up by TC, but still 2 apis.
>> 
>> What I believe is correct is to have one api:
>> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel
>> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>> 
>
>Couple comments around this, first adding yet another IR in the kernel
>and another JIT engine to map that IR on to eBPF or hardware vendor X
>doesn't get me excited. Its really much easier to write these as backend
>objects in LLVM. Not saying it can't be done just saying it is easier
>in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF.
>In the end this would be a reasonably complex bit of code in
>the kernel only for hardware offload. I have doubts that folks would
>ever use it for software only cases. I'm happy to admit I'm wrong here
>though.

Well for hw offload, every driver has to parse the IR (whatever will it
be in) and program HW accordingly. Similar parsing and translation would
be needed for SW path, to translate into eBPF. I don't think it would be
more complex than in the drivers. Should be fine.



>
>So yes using llvm backends creates two paths a hardware mgmt and sw
>path but in the hardware + software case typical on the edge the
>orchestration and management planes have started to manage the hardware
>and software as two blocks of logic for performance SLA logic. Even on
>the edge it seems in most cases folks are selling SR-IOV ports and
>can't fall back

96 matches

Mail list logo