[PATCH 1/1] xen-netfront: do not cast grant table reference to signed short
While grant reference is of type uint32_t, xen-netfront erroneously casts it to signed short in BUG_ON(). This would lead to the xen domU panic during boot-up or migration when it is attached with lots of paravirtual devices. Signed-off-by: Dongli Zhang--- drivers/net/xen-netfront.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index e17879d..189a28d 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -304,7 +304,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue) queue->rx_skbs[id] = skb; ref = gnttab_claim_grant_reference(>gref_rx_head); - BUG_ON((signed short)ref < 0); + WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref)); queue->grant_rx_ref[id] = ref; page = skb_frag_page(_shinfo(skb)->frags[0]); @@ -428,7 +428,7 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset, id = get_id_from_freelist(>tx_skb_freelist, queue->tx_skbs); tx = RING_GET_REQUEST(>tx, queue->tx.req_prod_pvt++); ref = gnttab_claim_grant_reference(>gref_tx_head); - BUG_ON((signed short)ref < 0); + WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref)); gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id, gfn, GNTMAP_readonly); -- 2.7.4
[PATCH net-next v2 3/7] qed*: Add support for WoL
Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed.h | 11 - drivers/net/ethernet/qlogic/qed/qed_dev.c | 19 - drivers/net/ethernet/qlogic/qed/qed_hsi.h | 4 ++ drivers/net/ethernet/qlogic/qed/qed_main.c | 29 + drivers/net/ethernet/qlogic/qed/qed_mcp.c | 56 - drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 41 ++ drivers/net/ethernet/qlogic/qede/qede_main.c| 9 include/linux/qed/qed_if.h | 10 + 9 files changed, 176 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index f20243c..8828ffa 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -195,6 +195,11 @@ enum qed_dev_cap { QED_DEV_CAP_ROCE, }; +enum qed_wol_support { + QED_WOL_SUPPORT_NONE, + QED_WOL_SUPPORT_PME, +}; + struct qed_hw_info { /* PCI personality */ enum qed_pci_personalitypersonality; @@ -227,6 +232,8 @@ struct qed_hw_info { u32 hw_mode; unsigned long device_capabilities; u16 mtu; + + enum qed_wol_support b_wol_support; }; struct qed_hw_cid_data { @@ -539,7 +546,9 @@ struct qed_dev { u8 mcp_rev; u8 boot_mode; - u8 wol; + /* WoL related configurations */ + u8 wol_config; + u8 wol_mac[ETH_ALEN]; u32 int_mode; enum qed_coalescing_modeint_coalescing_mode; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 33fd69e..127ed5f 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1364,8 +1364,24 @@ int qed_hw_reset(struct qed_dev *cdev) { int rc = 0; u32 unload_resp, unload_param; + u32 wol_param; int i; + switch (cdev->wol_config) { + case QED_OV_WOL_DISABLED: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_DISABLED; + break; + case QED_OV_WOL_ENABLED: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_ENABLED; + break; + default: + DP_NOTICE(cdev, + "Unknown WoL configuration %02x\n", cdev->wol_config); + /* Fallthrough */ + case QED_OV_WOL_DEFAULT: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_MCP; + } + for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; @@ -1394,8 +1410,7 @@ int qed_hw_reset(struct qed_dev *cdev) /* Send unload command to MCP */ rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt, -DRV_MSG_CODE_UNLOAD_REQ, -DRV_MB_PARAM_UNLOAD_WOL_MCP, +DRV_MSG_CODE_UNLOAD_REQ, wol_param, _resp, _param); if (rc) { DP_NOTICE(p_hwfn, "qed_hw_reset: UNLOAD_REQ failed\n"); diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index f7dfa2e..fdb7a09 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8601,6 +8601,7 @@ struct public_drv_mb { #define DRV_MSG_CODE_BIST_TEST 0x001e #define DRV_MSG_CODE_SET_LED_MODE 0x0020 +#define DRV_MSG_CODE_OS_WOL0x002e #define DRV_MSG_SEQ_NUMBER_MASK0x @@ -8697,6 +8698,9 @@ struct public_drv_mb { #define FW_MSG_CODE_NVM_OK 0x0001 #define FW_MSG_CODE_OK 0x0016 +#define FW_MSG_CODE_OS_WOL_SUPPORTED0x0080 +#define FW_MSG_CODE_OS_WOL_NOT_SUPPORTED0x0081 + #define FW_MSG_SEQ_NUMBER_MASK 0x u32 fw_mb_param; diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 31f8e42..b71d73a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -221,6 +221,10 @@ int qed_fill_dev_info(struct qed_dev *cdev, dev_info->fw_eng = FW_ENGINEERING_VERSION; dev_info->mf_mode = cdev->mf_mode; dev_info->tx_switching = true; + + if (QED_LEADING_HWFN(cdev)->hw_info.b_wol_support == + QED_WOL_SUPPORT_PME) + dev_info->wol_support = true; } else { qed_vf_get_fw_version(>hwfns[0], _info->fw_major,
[PATCH net-next v2 6/7] qed: Use VF-queue feature
Driver sets several restrictions about the number of supported VFs according to available HW/FW resources. This creates a problem as there are constellations which can't be supported [as limitation don't accurately describe the resources], as well as holes where enabling IOV would fail due to supposed lack of resources. This introduces a new interal feature - vf-queues, which would be used to lift some of the restriction and accurately enumerate the queues that can be used by a given PF's VFs. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_dev.c | 20 ++ drivers/net/ethernet/qlogic/qed/qed_int.c | 32 - drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 ++- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 8828ffa..6d3013f 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -174,6 +174,7 @@ enum QED_FEATURE { QED_PF_L2_QUE, QED_VF, QED_RDMA_CNQ, + QED_VF_L2_QUE, QED_MAX_FEATURES, }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 127ed5f..d996afe 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1476,6 +1476,7 @@ static void get_function_id(struct qed_hwfn *p_hwfn) static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) { u32 *feat_num = p_hwfn->hw_info.feat_num; + struct qed_sb_cnt_info sb_cnt_info; int num_features = 1; if (IS_ENABLED(CONFIG_QED_RDMA) && @@ -1494,10 +1495,21 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) feat_num[QED_PF_L2_QUE] = min_t(u32, RESC_NUM(p_hwfn, QED_SB) / num_features, RESC_NUM(p_hwfn, QED_L2_QUEUE)); - DP_VERBOSE(p_hwfn, NETIF_MSG_PROBE, - "#PF_L2_QUEUES=%d #SBS=%d num_features=%d\n", - feat_num[QED_PF_L2_QUE], RESC_NUM(p_hwfn, QED_SB), - num_features); + + memset(_cnt_info, 0, sizeof(sb_cnt_info)); + qed_int_get_num_sbs(p_hwfn, _cnt_info); + feat_num[QED_VF_L2_QUE] = + min_t(u32, + RESC_NUM(p_hwfn, QED_L2_QUEUE) - + FEAT_NUM(p_hwfn, QED_PF_L2_QUE), sb_cnt_info.sb_iov_cnt); + + DP_VERBOSE(p_hwfn, + NETIF_MSG_PROBE, + "#PF_L2_QUEUES=%d VF_L2_QUEUES=%d #ROCE_CNQ=%d #SBS=%d num_features=%d\n", + (int)FEAT_NUM(p_hwfn, QED_PF_L2_QUE), + (int)FEAT_NUM(p_hwfn, QED_VF_L2_QUE), + (int)FEAT_NUM(p_hwfn, QED_RDMA_CNQ), + RESC_NUM(p_hwfn, QED_SB), num_features); } static int qed_hw_get_resc(struct qed_hwfn *p_hwfn) diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 2adedc6..bb74e1c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -3030,6 +3030,31 @@ int qed_int_igu_read_cam(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) } } } + + /* There's a possibility the igu_sb_cnt_iov doesn't properly reflect +* the number of VF SBs [especially for first VF on engine, as we can't +* diffrentiate between empty entries and its entries]. +* Since we don't really support more SBs than VFs today, prevent any +* such configuration by sanitizing the number of SBs to equal the +* number of VFs. +*/ + if (IS_PF_SRIOV(p_hwfn)) { + u16 total_vfs = p_hwfn->cdev->p_iov_info->total_vfs; + + if (total_vfs < p_igu_info->free_blks) { + DP_VERBOSE(p_hwfn, + (NETIF_MSG_INTR | QED_MSG_IOV), + "Limiting number of SBs for IOV - %04x --> %04x\n", + p_igu_info->free_blks, + p_hwfn->cdev->p_iov_info->total_vfs); + p_igu_info->free_blks = total_vfs; + } else if (total_vfs > p_igu_info->free_blks) { + DP_NOTICE(p_hwfn, + "IGU has only %04x SBs for VFs while the device has %04x VFs\n", + p_igu_info->free_blks, total_vfs); + return -EINVAL; + } + } p_igu_info->igu_sb_cnt_iov = p_igu_info->free_blks; DP_VERBOSE( @@ -3163,7 +3188,12 @@ u16 qed_int_queue_id_from_sb_id(struct qed_hwfn *p_hwfn, u16 sb_id) return sb_id - p_info->igu_base_sb; } else if ((sb_id >= p_info->igu_base_sb_iov) && (sb_id <
[PATCH net-next v2 7/7] qed: Learn resources from management firmware
From: Tomer TayarCurrently, each interfaces assumes it receives an equal portion of HW/FW resources, but this is wasteful - different partitions [and specifically, parititions exposing different protocol support] might require different resources. Implement a new resource learning scheme where the information is received directly from the management firmware [which has knowledge of all of the functions and can serve as arbiter]. Signed-off-by: Tomer Tayar Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed.h | 6 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 291 -- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 46 + drivers/net/ethernet/qlogic/qed/qed_l2.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_mcp.c | 42 + drivers/net/ethernet/qlogic/qed/qed_mcp.h | 15 ++ include/linux/qed/qed_eth_if.h| 2 +- 7 files changed, 341 insertions(+), 63 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 6d3013f..50b8a01 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -154,7 +154,10 @@ struct qed_qm_iids { u32 tids; }; -enum QED_RESOURCES { +/* HW / FW resources, output of features supported below, most information + * is received from MFW. + */ +enum qed_resources { QED_SB, QED_L2_QUEUE, QED_VPORT, @@ -166,6 +169,7 @@ enum QED_RESOURCES { QED_RDMA_CNQ_RAM, QED_ILT, QED_LL2_QUEUE, + QED_CMDQS_CQS, QED_RDMA_STATS_QUEUE, QED_MAX_RESC, }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index d996afe..5be7b8a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1512,47 +1512,240 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) RESC_NUM(p_hwfn, QED_SB), num_features); } -static int qed_hw_get_resc(struct qed_hwfn *p_hwfn) +static enum resource_id_enum qed_hw_get_mfw_res_id(enum qed_resources res_id) +{ + enum resource_id_enum mfw_res_id = RESOURCE_NUM_INVALID; + + switch (res_id) { + case QED_SB: + mfw_res_id = RESOURCE_NUM_SB_E; + break; + case QED_L2_QUEUE: + mfw_res_id = RESOURCE_NUM_L2_QUEUE_E; + break; + case QED_VPORT: + mfw_res_id = RESOURCE_NUM_VPORT_E; + break; + case QED_RSS_ENG: + mfw_res_id = RESOURCE_NUM_RSS_ENGINES_E; + break; + case QED_PQ: + mfw_res_id = RESOURCE_NUM_PQ_E; + break; + case QED_RL: + mfw_res_id = RESOURCE_NUM_RL_E; + break; + case QED_MAC: + case QED_VLAN: + /* Each VFC resource can accommodate both a MAC and a VLAN */ + mfw_res_id = RESOURCE_VFC_FILTER_E; + break; + case QED_ILT: + mfw_res_id = RESOURCE_ILT_E; + break; + case QED_LL2_QUEUE: + mfw_res_id = RESOURCE_LL2_QUEUE_E; + break; + case QED_RDMA_CNQ_RAM: + case QED_CMDQS_CQS: + /* CNQ/CMDQS are the same resource */ + mfw_res_id = RESOURCE_CQS_E; + break; + case QED_RDMA_STATS_QUEUE: + mfw_res_id = RESOURCE_RDMA_STATS_QUEUE_E; + break; + default: + break; + } + + return mfw_res_id; +} + +static u32 qed_hw_get_dflt_resc_num(struct qed_hwfn *p_hwfn, + enum qed_resources res_id) { - u8 enabled_func_idx = p_hwfn->enabled_func_idx; - u32 *resc_start = p_hwfn->hw_info.resc_start; u8 num_funcs = p_hwfn->num_funcs_on_engine; - u32 *resc_num = p_hwfn->hw_info.resc_num; struct qed_sb_cnt_info sb_cnt_info; - int i, max_vf_vlan_filters; + u32 dflt_resc_num = 0; - memset(_cnt_info, 0, sizeof(sb_cnt_info)); + switch (res_id) { + case QED_SB: + memset(_cnt_info, 0, sizeof(sb_cnt_info)); + qed_int_get_num_sbs(p_hwfn, _cnt_info); + dflt_resc_num = sb_cnt_info.sb_cnt; + break; + case QED_L2_QUEUE: + dflt_resc_num = MAX_NUM_L2_QUEUES_BB / num_funcs; + break; + case QED_VPORT: + dflt_resc_num = MAX_NUM_VPORTS_BB / num_funcs; + break; + case QED_RSS_ENG: + dflt_resc_num = ETH_RSS_ENGINE_NUM_BB / num_funcs; + break; + case QED_PQ: + /* The granularity of the PQs is 8 */ + dflt_resc_num = MAX_QM_TX_QUEUES_BB / num_funcs; + dflt_resc_num &= ~0x7; + break; + case QED_RL: + dflt_resc_num =
[PATCH net-next v2 4/7] qede: Decouple ethtool caps from qed
While the qed_lm_maps is closely tied with the QED_LM_* defines, when iterating over the array use actual size instead of the qed define to prevent future possible issues. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index 327c614..fe7e7b8 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -320,7 +320,7 @@ struct qede_link_mode_mapping { { \ int i; \ \ - for (i = 0; i < QED_LM_COUNT; i++) {\ + for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) { \ if ((caps) & (qed_lm_map[i].qed_link_mode)) \ __set_bit(qed_lm_map[i].ethtool_link_mode,\ lk_ksettings->link_modes.name); \ @@ -331,7 +331,7 @@ struct qede_link_mode_mapping { { \ int i; \ \ - for (i = 0; i < QED_LM_COUNT; i++) {\ + for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) { \ if (test_bit(qed_lm_map[i].ethtool_link_mode, \ lk_ksettings->link_modes.name))\ caps |= qed_lm_map[i].qed_link_mode;\ -- 1.9.3
[PATCH net-next v2 1/7] qed*: Management firmware - notifications and defaults
From: Sudarsana KalluruManagement firmware is interested in various tidbits about the driver - including the driver state & several configuration related fields [MTU, primtary MAC, etc.]. This adds the necessray logic to update MFW with such configurations, some of which are passed directly via qed while for others APIs are provide so that qede would be able to later configure if needed. This also introduces a new default configuration for MTU which would replace the default inherited by being an ethernet device. Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_dev.c | 52 +++- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 59 - drivers/net/ethernet/qlogic/qed/qed_main.c | 75 +++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 163 drivers/net/ethernet/qlogic/qed/qed_mcp.h | 102 +++ drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 2 + drivers/net/ethernet/qlogic/qede/qede_main.c| 8 ++ include/linux/qed/qed_if.h | 28 9 files changed, 487 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 653bb57..f20243c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -226,6 +226,7 @@ struct qed_hw_info { u32 port_mode; u32 hw_mode; unsigned long device_capabilities; + u16 mtu; }; struct qed_hw_cid_data { diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index edae5fc..33fd69e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1057,8 +1057,10 @@ int qed_hw_init(struct qed_dev *cdev, bool allow_npar_tx_switch, const u8 *bin_fw_data) { - u32 load_code, param; - int rc, mfw_rc, i; + u32 load_code, param, drv_mb_param; + bool b_default_mtu = true; + struct qed_hwfn *p_hwfn; + int rc = 0, mfw_rc, i; if ((int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) { DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n"); @@ -1074,6 +1076,12 @@ int qed_hw_init(struct qed_dev *cdev, for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; + /* If management didn't provide a default, set one of our own */ + if (!p_hwfn->hw_info.mtu) { + p_hwfn->hw_info.mtu = 1500; + b_default_mtu = false; + } + if (IS_VF(cdev)) { p_hwfn->b_int_enabled = 1; continue; @@ -1157,6 +1165,38 @@ int qed_hw_init(struct qed_dev *cdev, p_hwfn->hw_init_done = true; } + if (IS_PF(cdev)) { + p_hwfn = QED_LEADING_HWFN(cdev); + drv_mb_param = (FW_MAJOR_VERSION << 24) | + (FW_MINOR_VERSION << 16) | + (FW_REVISION_VERSION << 8) | + (FW_ENGINEERING_VERSION); + rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt, +DRV_MSG_CODE_OV_UPDATE_STORM_FW_VER, +drv_mb_param, _code, ); + if (rc) + DP_INFO(p_hwfn, "Failed to update firmware version\n"); + + if (!b_default_mtu) { + rc = qed_mcp_ov_update_mtu(p_hwfn, p_hwfn->p_main_ptt, + p_hwfn->hw_info.mtu); + if (rc) + DP_INFO(p_hwfn, + "Failed to update default mtu\n"); + } + + rc = qed_mcp_ov_update_driver_state(p_hwfn, + p_hwfn->p_main_ptt, + QED_OV_DRIVER_STATE_DISABLED); + if (rc) + DP_INFO(p_hwfn, "Failed to update driver state\n"); + + rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt, + QED_OV_ESWITCH_VEB); + if (rc) + DP_INFO(p_hwfn, "Failed to update eswitch mode\n"); + } + return 0; } @@ -1801,6 +1841,9 @@ static void qed_get_num_funcs(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) qed_get_num_funcs(p_hwfn, p_ptt); + if (qed_mcp_is_init(p_hwfn)) + p_hwfn->hw_info.mtu = p_hwfn->mcp_info->func_info.mtu; + return
[PATCH net-next v2 5/7] qed: Learn of RDMA capabilities per-device
Today, RDMA capabilities are learned from management firmware which provides a per-device indication for all interfaces. Newer management firmware is capable of providing a per-device indication [would later be extended to either RoCE/iWARP]. Try using this newer learning mechanism, but fallback in case management firmware is too old to retain current functionality. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 7 +++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 78 +++ 2 files changed, 77 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index fdb7a09..1d113ce 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8601,6 +8601,7 @@ struct public_drv_mb { #define DRV_MSG_CODE_BIST_TEST 0x001e #define DRV_MSG_CODE_SET_LED_MODE 0x0020 +#define DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL 0x002b #define DRV_MSG_CODE_OS_WOL0x002e #define DRV_MSG_SEQ_NUMBER_MASK0x @@ -8705,6 +8706,12 @@ struct public_drv_mb { u32 fw_mb_param; + /* get pf rdma protocol command responce */ +#define FW_MB_PARAM_GET_PF_RDMA_NONE 0x0 +#define FW_MB_PARAM_GET_PF_RDMA_ROCE 0x1 +#define FW_MB_PARAM_GET_PF_RDMA_IWARP 0x2 +#define FW_MB_PARAM_GET_PF_RDMA_BOTH 0x3 + u32 drv_pulse_mb; #define DRV_PULSE_SEQ_MASK 0x7fff #define DRV_PULSE_SYSTEM_TIME_MASK 0x diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 768b35b..0927488 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1024,28 +1024,89 @@ int qed_mcp_get_media_type(struct qed_dev *cdev, u32 *p_media_type) return 0; } +/* Old MFW has a global configuration for all PFs regarding RDMA support */ +static void +qed_mcp_get_shmem_proto_legacy(struct qed_hwfn *p_hwfn, + enum qed_pci_personality *p_proto) +{ + /* There wasn't ever a legacy MFW that published iwarp. +* So at this point, this is either plain l2 or RoCE. +*/ + if (test_bit(QED_DEV_CAP_ROCE, _hwfn->hw_info.device_capabilities)) + *p_proto = QED_PCI_ETH_ROCE; + else + *p_proto = QED_PCI_ETH; + + DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP, + "According to Legacy capabilities, L2 personality is %08x\n", + (u32) *p_proto); +} + +static int +qed_mcp_get_shmem_proto_mfw(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, + enum qed_pci_personality *p_proto) +{ + u32 resp = 0, param = 0; + int rc; + + rc = qed_mcp_cmd(p_hwfn, p_ptt, +DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL, 0, , ); + if (rc) + return rc; + if (resp != FW_MSG_CODE_OK) { + DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP, + "MFW lacks support for command; Returns %08x\n", + resp); + return -EINVAL; + } + + switch (param) { + case FW_MB_PARAM_GET_PF_RDMA_NONE: + *p_proto = QED_PCI_ETH; + break; + case FW_MB_PARAM_GET_PF_RDMA_ROCE: + *p_proto = QED_PCI_ETH_ROCE; + break; + case FW_MB_PARAM_GET_PF_RDMA_BOTH: + DP_NOTICE(p_hwfn, + "Current day drivers don't support RoCE & iWARP. Default to RoCE-only\n"); + *p_proto = QED_PCI_ETH_ROCE; + break; + case FW_MB_PARAM_GET_PF_RDMA_IWARP: + default: + DP_NOTICE(p_hwfn, + "MFW answers GET_PF_RDMA_PROTOCOL but param is %08x\n", + param); + return -EINVAL; + } + + DP_VERBOSE(p_hwfn, + NETIF_MSG_IFUP, + "According to capabilities, L2 personality is %08x [resp %08x param %08x]\n", + (u32) *p_proto, resp, param); + return 0; +} + static int qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn, struct public_func *p_info, + struct qed_ptt *p_ptt, enum qed_pci_personality *p_proto) { int rc = 0; switch (p_info->config & FUNC_MF_CFG_PROTOCOL_MASK) { case FUNC_MF_CFG_PROTOCOL_ETHERNET: - if (test_bit(QED_DEV_CAP_ROCE, -_hwfn->hw_info.device_capabilities)) - *p_proto = QED_PCI_ETH_ROCE; - else - *p_proto = QED_PCI_ETH; + if (qed_mcp_get_shmem_proto_mfw(p_hwfn, p_ptt, p_proto)) +
[PATCH net-next v2 2/7] qed: Add nvram selftest
Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 4 + drivers/net/ethernet/qlogic/qed/qed_main.c | 1 + drivers/net/ethernet/qlogic/qed/qed_mcp.c | 94 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.h | 41 ++ drivers/net/ethernet/qlogic/qed/qed_selftest.c | 101 drivers/net/ethernet/qlogic/qed/qed_selftest.h | 10 +++ drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 7 ++ include/linux/qed/qed_if.h | 9 +++ 8 files changed, 267 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index 36de87a..f7dfa2e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8666,6 +8666,8 @@ struct public_drv_mb { #define DRV_MB_PARAM_BIST_REGISTER_TEST1 #define DRV_MB_PARAM_BIST_CLOCK_TEST 2 +#define DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES 3 +#define DRV_MB_PARAM_BIST_NVM_TEST_IMAGE_BY_INDEX 4 #define DRV_MB_PARAM_BIST_RC_UNKNOWN 0 #define DRV_MB_PARAM_BIST_RC_PASSED1 @@ -8674,6 +8676,8 @@ struct public_drv_mb { #define DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT 0 #define DRV_MB_PARAM_BIST_TEST_INDEX_MASK 0x00FF +#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_SHIFT 8 +#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_MASK0xFF00 u32 fw_mb_header; #define FW_MSG_CODE_MASK 0x diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index d9fa52a..31f8e42 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -1508,6 +1508,7 @@ static int qed_update_mtu(struct qed_dev *cdev, u16 mtu) .selftest_interrupt = _selftest_interrupt, .selftest_register = _selftest_register, .selftest_clock = _selftest_clock, + .selftest_nvram = _selftest_nvram, }; const struct qed_common_ops qed_common_ops_pass = { diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 98dc913..8be6157 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1434,6 +1434,52 @@ int qed_mcp_mask_parities(struct qed_hwfn *p_hwfn, return rc; } +int qed_mcp_nvm_read(struct qed_dev *cdev, u32 addr, u8 *p_buf, u32 len) +{ + u32 bytes_left = len, offset = 0, bytes_to_copy, read_len = 0; + struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev); + u32 resp = 0, resp_param = 0; + struct qed_ptt *p_ptt; + int rc = 0; + + p_ptt = qed_ptt_acquire(p_hwfn); + if (!p_ptt) + return -EBUSY; + + while (bytes_left > 0) { + bytes_to_copy = min_t(u32, bytes_left, MCP_DRV_NVM_BUF_LEN); + + rc = qed_mcp_nvm_rd_cmd(p_hwfn, p_ptt, + DRV_MSG_CODE_NVM_READ_NVRAM, + addr + offset + + (bytes_to_copy << +DRV_MB_PARAM_NVM_LEN_SHIFT), + , _param, + _len, + (u32 *)(p_buf + offset)); + + if (rc || (resp != FW_MSG_CODE_NVM_OK)) { + DP_NOTICE(cdev, "MCP command rc = %d\n", rc); + break; + } + + /* This can be a lengthy process, and it's possible scheduler +* isn't preemptable. Sleep a bit to prevent CPU hogging. +*/ + if (bytes_left % 0x1000 < + (bytes_left - read_len) % 0x1000) + usleep_range(1000, 2000); + + offset += read_len; + bytes_left -= read_len; + } + + cdev->mcp_nvm_resp = resp; + qed_ptt_release(p_hwfn, p_ptt); + + return rc; +} + int qed_mcp_bist_register_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) { u32 drv_mb_param = 0, rsp, param; @@ -1475,3 +1521,51 @@ int qed_mcp_bist_clock_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) return rc; } + +int qed_mcp_bist_nvm_test_get_num_images(struct qed_hwfn *p_hwfn, +struct qed_ptt *p_ptt, +u32 *num_images) +{ + u32 drv_mb_param = 0, rsp; + int rc = 0; + + drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES << + DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT); + + rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST, +drv_mb_param, , num_images); + if (rc) + return rc; + + if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK)) + rc = -EINVAL; + +
[PATCH net-next v2 0/7] qed*: Patch series
This series does several things. The bigger changes: - Add new notification APIs [& Defaults] for various fields. The series then utilizes some of those qed <-> qede APIs to bass WoL support upon. - Change the resource allocation scheme to receive the values from management firmware, instead of equally sharing resources between functions [that might not need those]. That would, e.g., allow us to configure additional filters to network interfaces in presence of storage [PCI] functions from same adapter. Dave, Please consider applying this series to `net-next'. Thanks, Yuval Changes from previous version: - - V2: Rebase on top of latest net-next. Sudarsana Kalluru (1): qed*: Management firmware - notifications and defaults Tomer Tayar (1): qed: Learn resources from management firmware Yuval Mintz (5): qed: Add nvram selftest qed*: Add support for WoL qede: Decouple ethtool caps from qed qed: Learn of RDMA capabilities per-device qed: Use VF-queue feature drivers/net/ethernet/qlogic/qed/qed.h | 19 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 382 + drivers/net/ethernet/qlogic/qed/qed_hsi.h | 120 ++- drivers/net/ethernet/qlogic/qed/qed_int.c | 32 +- drivers/net/ethernet/qlogic/qed/qed_l2.c| 2 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 105 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 433 +++- drivers/net/ethernet/qlogic/qed/qed_mcp.h | 158 + drivers/net/ethernet/qlogic/qed/qed_selftest.c | 101 ++ drivers/net/ethernet/qlogic/qed/qed_selftest.h | 10 + drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 +- drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 54 ++- drivers/net/ethernet/qlogic/qede/qede_main.c| 17 + include/linux/qed/qed_eth_if.h | 2 +- include/linux/qed/qed_if.h | 47 +++ 16 files changed, 1404 insertions(+), 97 deletions(-) -- 1.9.3
Re: [bnx2] [Regression 4.8] Driver loading fails without firmware
Hi Paul, On 10/30/16 at 12:05pm, Paul Menzel wrote: > Dear Baoquan, > > > Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He: > > On 10/27/16 at 03:21pm, Paul Menzel wrote: > > > > > > Baoquan, could you please fix this regression. My suggestion is, that > > > > > you > > > > > add the old code back, but check if the firmware has been loaded. If > > > > > it > > > > > hasn’t, load it again. > > > > > > > > > > That way, people can update their Linux kernel, and it continues > > > > > working > > > > > without changing the initramfs, or anything else. > > > > > > > > I saw your mail but I am also not familiar with bnx2 driver. As the > > > > commit log says I just tried to make bnx2 driver reset itself earlier. > > > > > > > > So you did a git bisect and found this commit caused the regression, > > > > right? If yes, and network developers have no action, I will look into > > > > the code and see if I have idea to fix it. > > > > > > Well, I looked through the commits and found that one, which would explain > > > the changed behavior. > > > > > > To be sure, and to follow your request, I took Linux 4.8.4 and reverted > > > your > > > commit (attached). Then I deleted the firmware again from the initramfs, > > > and > > > rebooted. The devices showed up just fine as before. > > > > > > So to summarize, the commit is indeed the culprit. > > > Sorry for this. > > > > Could you tell the steps to reproduce? I will find a machine with bnx2 > > NIC and check if there's other ways. > > Well, delete the bnx2 firmware files from the initramfs, and start the > system. > > Did you read my proposal, to try to load the firmware twice, that means, > basically revert only the deleted lines of your commit, and add an > additional check? Thanks for your information! I got a x86_64 system with bnx2 NIC, and clone Linus's git tree into that system. Then building a new kernel 4.9.0-rc3+ with new initramfs. But when I uncompressed the new initramfs, didn't find bnx2 related firmware, no bnx2 files under lib/firmware of uncompressed initramfs folder. While I did see them in /lib/firmware/bnx2/bnx2-x.fw. Could you please say it more specifically how I should do to reproduce the failure you encountered? I think your proposal looks good, just need a test before post. Thanks Baoquan
Re: [PATCH net] r8152: Fix broken RX checksums.
From: Mark LordDate: Sun, 30 Oct 2016 22:07:25 -0400 > On 16-10-30 08:57 PM, David Miller wrote: >> From: Mark Lord >> Date: Sun, 30 Oct 2016 19:28:27 -0400 >> >>> The r8152 driver has been broken since (approx) 3.16.xx >>> when support was added for hardware RX checksums >>> on newer chip versions. Symptoms include random >>> segfaults and silent data corruption over NFS. >>> >>> The hardware checksum logig does not work on the VER_02 >>> dongles I have here when used with a slow embedded system CPU. >>> Google reveals others reporting similar issues on Raspberry Pi. >>> >>> So, disable hardware RX checksum support for VER_02, and fix >>> an obvious coding error for IPV6 checksums in the same function. >>> >>> Because this bug results in silent data corruption, >>> it is a good candidate for back-porting to -stable >= 3.16.xx. >>> >>> Signed-off-by: Mark Lord >> >> Applied and queued up for -stable, thanks. > > Thanks. Now that this is taken care of, I do wonder if perhaps > RX checksums ought to be enabled at all for ANY versions of this chip? You should really start a dialogue with the developer who has been making the most, if not all, of the major changes to this driver over the past few years, Hayes Wang.
Re: [PATCH net] r8152: Fix broken RX checksums.
On 16-10-30 08:57 PM, David Miller wrote: > From: Mark Lord> Date: Sun, 30 Oct 2016 19:28:27 -0400 > >> The r8152 driver has been broken since (approx) 3.16.xx >> when support was added for hardware RX checksums >> on newer chip versions. Symptoms include random >> segfaults and silent data corruption over NFS. >> >> The hardware checksum logig does not work on the VER_02 >> dongles I have here when used with a slow embedded system CPU. >> Google reveals others reporting similar issues on Raspberry Pi. >> >> So, disable hardware RX checksum support for VER_02, and fix >> an obvious coding error for IPV6 checksums in the same function. >> >> Because this bug results in silent data corruption, >> it is a good candidate for back-porting to -stable >= 3.16.xx. >> >> Signed-off-by: Mark Lord > > Applied and queued up for -stable, thanks. Thanks. Now that this is taken care of, I do wonder if perhaps RX checksums ought to be enabled at all for ANY versions of this chip? My theory is that the checksums probably work okay most of the time, except when the hardware RX buffer overflows. In my case, and in the case of the Raspberry Pi, the receiving CPU is quite a bit slower than mainstream x86, so it can quite easily fall behind in emptying the RX buffer on the chip. The only indication this has happened may be an incorrect RX checksum. This is only a theory, but I otherwise have trouble explaining why we are seeing invalid RX checksums -- direct cable connections to a switch, shared only with the NFS server. No reason for it to have bad RX checksums in the first place. Should we just blanket disable RX checksums for all versions here unless proven otherwise/safe? Anyone out there know better? Cheers Mark
Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation
On Sun, Oct 30, 2016 at 2:47 PM, Thomas Grafwrote: > On 10/30/16 at 01:34pm, Tom Herbert wrote: >> On Sun, Oct 30, 2016 at 4:58 AM, Thomas Graf wrote: >> > + if (unlikely(!dst->lwtstate->orig_output)) { >> > + WARN_ONCE(1, "orig_output not set on dst for prog %s\n", >> > + bpf->out.name); >> > + kfree_skb(skb); >> > + return -EINVAL; >> > + } >> > + >> > + return dst->lwtstate->orig_output(net, sk, skb); >> >> The BPF program may have changed the destination address so continuing >> with original route in skb may not be appropriate here. This was fixed >> in ila_lwt by calling ip6_route_output and we were able to dst cache >> facility to cache the route to avoid cost of looking it up on every >> packet. Since the kernel has no insight into what the BPF program >> does to the packet I'd suggest 1) checking if destination address >> changed by BPF and if it did then call route_output to get new route >> 2) If the LWT destination is a host route then try to keep a dst >> cache. This would entail checking destination address on return that >> it is the same one as kept in the dst cache. > > Instead of building complex logic, we can allow the program to return > a code to indicate when to perform another route lookup just as we do > for the redirect case. Just because the destination address has > changed may not require another lookup in all cases. A typical example > would be a program rewriting addresses for the default route to other > address which are always handled by the default route as well. An > unconditional lookup would hurt performance in many cases. Right, that's why we rely on a dst cache. Any use of LWT that encapsulates or tunnels to a fixed destination (ILA, VXLAN, IPIP, etc.) would want to use the dst cache optimization to avoid the second lookup. The ILA LWT code used to call orig output and that worked as long as we could set the default router as the gateway "via". It was something we were able to deploy, but not a general solution. Integrating properly with routing gives a much better solution IMO. Note that David Lebrun's latest LWT Segment Routing patch does the second lookup with the dst cache to try to avoid it. Thanks, Tom
Re: [PATCH net] r8152: Fix broken RX checksums.
From: Mark LordDate: Sun, 30 Oct 2016 19:28:27 -0400 > The r8152 driver has been broken since (approx) 3.16.xx > when support was added for hardware RX checksums > on newer chip versions. Symptoms include random > segfaults and silent data corruption over NFS. > > The hardware checksum logig does not work on the VER_02 > dongles I have here when used with a slow embedded system CPU. > Google reveals others reporting similar issues on Raspberry Pi. > > So, disable hardware RX checksum support for VER_02, and fix > an obvious coding error for IPV6 checksums in the same function. > > Because this bug results in silent data corruption, > it is a good candidate for back-porting to -stable >= 3.16.xx. > > Signed-off-by: Mark Lord Applied and queued up for -stable, thanks.
Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums
On Sun, 2016-10-30 at 17:22 -0400, David Miller wrote: > 3) "Fix broken RX checksums." Commit header lines and commit > messages are proper English, therefore sentences should > begin with a capitalized letter and end with a period. Commit messages should be proper English. But commit header lines should not end with a period. The vast majority doesn't. Yes, I've just checked. How many newspaper headlines end with a period? Thanks, Paul Bolle
[PATCH net] r8152: Fix broken RX checksums.
The r8152 driver has been broken since (approx) 3.16.xx when support was added for hardware RX checksums on newer chip versions. Symptoms include random segfaults and silent data corruption over NFS. The hardware checksum logig does not work on the VER_02 dongles I have here when used with a slow embedded system CPU. Google reveals others reporting similar issues on Raspberry Pi. So, disable hardware RX checksum support for VER_02, and fix an obvious coding error for IPV6 checksums in the same function. Because this bug results in silent data corruption, it is a good candidate for back-porting to -stable >= 3.16.xx. Signed-off-by: Mark Lord--- old/drivers/net/usb/r8152.c 2016-09-30 04:20:43.0 -0400 +++ linux/drivers/net/usb/r8152.c 2016-10-26 14:15:44.932517676 -0400 @@ -1645,7 +1645,7 @@ u8 checksum = CHECKSUM_NONE; u32 opts2, opts3; - if (tp->version == RTL_VER_01) + if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02) goto return_result; opts2 = le32_to_cpu(rx_desc->opts2); @@ -1660,7 +1660,7 @@ checksum = CHECKSUM_NONE; else checksum = CHECKSUM_UNNECESSARY; - } else if (RD_IPV6_CS) { + } else if (opts2 & RD_IPV6_CS) { if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF)) checksum = CHECKSUM_UNNECESSARY; else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))
Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme
On Mon, Oct 31, 2016 at 1:11 AM, Saeed Mahameedwrote: > On Mon, Oct 31, 2016 at 12:44 AM, Alexei Starovoitov > wrote: >> On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote: >>> >>> Note that the XDP TX rings are no longer shown in ethtool -S. >> >> ouch. Can you make it to show them as some large TX numbers instead? >> It would really sux to lose stats on them. >> > > Right, Tariq, how did we miss this ? > > FYI, I don't think we need the whole TX queue stats for XDP tx rings, > it is just an overkill, there are only two active counters for XDP TX > ring (XDP_TX_FWD/XDP_TX_DROP). > > XDP_TX_FWD or currently "tx_packets" will count successfully forwarded packets > XDP_TX_DROP or currently "tx_dropped" will count TX dropped packets > due to full ring. > > do we need tx_bytes as well ? I think yes. > > The whole idea of this refactoring i.e. differentiating between TXQ > netdev rings and XDP TX rings, that XDP is a fast path with minimal > system overhead, we don't need to have the full set of regular TXQ > counters for XDP TX rings. BTW in mlx5 we have the required xdp stats as a part of the rx ring, static const struct counter_desc rq_stats_desc[] = { [...] { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_drop) }, { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx) }, { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_tx_full) }, [...] We should do the same here.
Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme
On Mon, Oct 31, 2016 at 12:44 AM, Alexei Starovoitovwrote: > On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote: >> >> Note that the XDP TX rings are no longer shown in ethtool -S. > > ouch. Can you make it to show them as some large TX numbers instead? > It would really sux to lose stats on them. > Right, Tariq, how did we miss this ? FYI, I don't think we need the whole TX queue stats for XDP tx rings, it is just an overkill, there are only two active counters for XDP TX ring (XDP_TX_FWD/XDP_TX_DROP). XDP_TX_FWD or currently "tx_packets" will count successfully forwarded packets XDP_TX_DROP or currently "tx_dropped" will count TX dropped packets due to full ring. do we need tx_bytes as well ? I think yes. The whole idea of this refactoring i.e. differentiating between TXQ netdev rings and XDP TX rings, that XDP is a fast path with minimal system overhead, we don't need to have the full set of regular TXQ counters for XDP TX rings.
Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
John Heenanwrites: > Thanks for your reply. > > The code was tested on a Cube i9 which has an internal rtl8723bu. > > No other devices were tested. > > I am happy to accept in an ideal context hard coding macpower is > undesirable, the comment is undesirable and it is wrong to assume the > issue is not unique to the rtl8723bu. > > Your reply is idealistic. What can I do now? I should of course have > factored out other untested devices in my patches. The apparent > concern you have with process over outcome is a useful lesson. > > We are not in an ideal situation. The comment is of course relevant > and useful to starting a process to fixing a real bug I do not have > sufficient information to refine any further for and others do. In the > circumstances nothing really more can be expected. Well you should start by reporting the issue and either providing a patch that only affects 8723bu, or work on a generic solution. I appreciate patches, but I do not appreciate patches that will make something work for one person and break for everyone else - I spent a lot of time making sure the driver works across the different devices. The comment violates all Linux standards - first rule when modifying code is to respect the style of the code you are dealing with. Code is 80 characters wide, and comments are /* */ never the ugly C++ crap. > My patch cover letter, [PATCH 0/2] provides evidence of a mess with > regard to determining macpower for the rtl8723bu and what is > subsequently required. This is important. > > The kernel driver code is very poorly documented and there is not a > single source reference to device documentation. For example macpower > is noting more than a setting that is true or false according to > whether a read of a particular register return 0xef or not. Such value > was never obtained so a full init sequence was never performed. The kernel driver is documented with the information I have - there is NO device documentation because Realtek refuses to provide any. I have written the driver based on what I have retried by reading the vendor drivers. If you can provide better documentation, I certainly would love to get it. > It would be helpful if you could provide a link to device references. > As it is, how am I supposed to revise the patch without relevant > information? Look at the USB device table, it shows you which devices are supported. > My patch code works with the Cube i9, as is, despite a lack of > adequate information. Before it did not. That is a powerful statement The driver works with a lot of different devices in itself that is a powerful statement! Yes I want to see it work with as many devices as possible, but just moving things around without balancing it and not explaining why is not a fix. If we move more of the init sequence to _start() you also have to move matching pieces to _stop(). Jes
Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
On Fri, Oct 28, 2016 at 01:11:01PM -0400, David Miller wrote: > From: John Fastabend> Date: Fri, 28 Oct 2016 08:56:35 -0700 > > > On 16-10-27 07:10 PM, David Miller wrote: > >> From: Alexander Duyck > >> Date: Thu, 27 Oct 2016 18:43:59 -0700 > >> > >>> On Thu, Oct 27, 2016 at 6:35 PM, David Miller wrote: > From: "Michael S. Tsirkin" > Date: Fri, 28 Oct 2016 01:25:48 +0300 > > > On Thu, Oct 27, 2016 at 05:42:18PM -0400, David Miller wrote: > >> From: "Michael S. Tsirkin" > >> Date: Fri, 28 Oct 2016 00:30:35 +0300 > >> > >>> Something I'd like to understand is how does XDP address the > >>> problem that 100Byte packets are consuming 4K of memory now. > >> > >> Via page pools. We're going to make a generic one, but right now > >> each and every driver implements a quick list of pages to allocate > >> from (and thus avoid the DMA man/unmap overhead, etc.) > > > > So to clarify, ATM virtio doesn't attempt to avoid dma map/unmap > > so there should be no issue with that even when using sub/page > > regions, assuming DMA APIs support sub-page map/unmap correctly. > > That's not what I said. > > The page pools are meant to address the performance degradation from > going to having one packet per page for the sake of XDP's > requirements. > > You still need to have one packet per page for correct XDP operation > whether you do page pools or not, and whether you have DMA mapping > (or it's equivalent virutalization operation) or not. > >>> > >>> Maybe I am missing something here, but why do you need to limit things > >>> to one packet per page for correct XDP operation? Most of the drivers > >>> out there now are usually storing something closer to at least 2 > >>> packets per page, and with the DMA API fixes I am working on there > >>> should be no issue with changing the contents inside those pages since > >>> we won't invalidate or overwrite the data after the DMA buffer has > >>> been synchronized for use by the CPU. > >> > >> Because with SKB's you can share the page with other packets. > >> > >> With XDP you simply cannot. > >> > >> It's software semantics that are the issue. SKB frag list pages > >> are read only, XDP packets are writable. > >> > >> This has nothing to do with "writability" of the pages wrt. DMA > >> mapping or cpu mappings. > >> > > > > Sorry I'm not seeing it either. The current xdp_buff is defined > > by, > > > > struct xdp_buff { > > void *data; > > void *data_end; > > }; > > > > The verifier has an xdp_is_valid_access() check to ensure we don't go > > past data_end. The page for now at least never leaves the driver. For > > the work to get xmit to other devices working I'm still not sure I see > > any issue. > > I guess I can say that the packets must be "writable" until I'm blue > in the face but I'll say it again, semantically writable pages are a > requirement. And if multiple packets share a page this requirement > is not satisfied. > > Also, we want to do several things in the future: > > 1) Allow push/pop of headers via eBPF code, which needs we need >headroom. I think that with e.g. LRO or a large MTU page per packet does not guarantee headroom. > 2) Transparently zero-copy pass packets into userspace, basically >the user will have a semi-permanently mapped ring of all the >packet pages sitting in the RX queue of the device and the >page pool associated with it. This way we avoid all of the >TLB flush/map overhead for the user's mapping of the packets >just as we avoid the DMA map/unmap overhead. Looks like you can share pages between packets as long as they all come from the same pool so accessible to the same userspace. > And that's just the beginninng. > > I'm sure others can come up with more reasons why we have this > requirement. I'm still a bit confused :( Is this a requirement of the current code or to enable future extensions? -- MST
Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme
On Sun, Oct 30, 2016 at 06:03:06PM +0200, Tariq Toukan wrote: > > Note that the XDP TX rings are no longer shown in ethtool -S. ouch. Can you make it to show them as some large TX numbers instead? It would really sux to lose stats on them.
Re: Let's do P4
On Sun, Oct 30, 2016 at 05:38:36PM +0100, Jiri Pirko wrote: > Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: > >On 10/30/16 at 08:44am, Jiri Pirko wrote: > >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: > >> >On 16-10-29 07:49 AM, Jakub Kicinski wrote: > >> >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote: > >> >>> Hi all. > >> >>> sorry for delay. travelling to KS, so probably missed something in this thread and comments can be totally off... the subject "let's do P4" is imo misleading, since it reads like we don't do P4 at the moment, whereas the opposite is true. Several p4->bpf compilers is a proof. > The network world is divided into 2 general types of hw: > 1) network ASICs - network specific silicon, containing things like TCAM >These ASICs are suitable to be programmed by P4. i think the opposite is the case in case of P4. when hw asic has tcam it's still far far away from being usable with P4 which requires fully programmable protocol parser, arbitrary tables and so on. P4 doesn't even define TCAM as a table type. The p4 program can declare a desired algorithm of search in the table and compiler has to figure out what HW resources to use to satisfy such p4 program. > 2) network processors - basically a general purpose CPUs >These processors are suitable to be programmed by eBPF. I think this statement is also misleading, since it positions p4 and bpf as competitors whereas that's not the case. p4 is the language. bpf is an instruction set. > Exactly. Following drawing shows p4 pipeline setup for SW and Hw: > > | > | +--> ebpf engine > | | > | | > | compilerB > | ^ > | | > p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW > | >userspace | kernel > | frankly this diagram smells very much like kernel bypass to me, since I cannot see how one can put the whole p4 language compiler into the driver, so this last step of p4ast->hw, I presume, will be done by firmware, which will be running full compiler in an embedded cpu on the switch. To me that's precisely the kernel bypass, since we won't have a clue what HW capabilities actually are and won't be able to fine grain control them. Please correct me if I'm wrong. > Plus the thing I cannot imagine in the model you propose is table fillup. > For ebpf, you use maps. For p4 you would have to have a separate HW-only > API. This is very similar to the original John's Flow-API. And therefore > a kernel bypass. I think John's flow api is a better way to expose mellanox switch capabilities. I also think it's not fair to call it 'bypass'. I see nothing in it that justify such 'swear word' ;) The goal of flow api was to expose HW features to user space, so that user space can program it. For something simple as mellanox switch asic it fits perfectly well. Unless I misunderstand the bigger goal of this discussion and it's about programming ezchip devices. If the goal is to model hw tcam in the linux kernel then just introduce tcam bpf map type. It will be dog slow in user space, but it will match exactly what is happnening in the HW and user space can make sensible trade-offs.
Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation
On 10/30/16 at 01:34pm, Tom Herbert wrote: > On Sun, Oct 30, 2016 at 4:58 AM, Thomas Grafwrote: > > + if (unlikely(!dst->lwtstate->orig_output)) { > > + WARN_ONCE(1, "orig_output not set on dst for prog %s\n", > > + bpf->out.name); > > + kfree_skb(skb); > > + return -EINVAL; > > + } > > + > > + return dst->lwtstate->orig_output(net, sk, skb); > > The BPF program may have changed the destination address so continuing > with original route in skb may not be appropriate here. This was fixed > in ila_lwt by calling ip6_route_output and we were able to dst cache > facility to cache the route to avoid cost of looking it up on every > packet. Since the kernel has no insight into what the BPF program > does to the packet I'd suggest 1) checking if destination address > changed by BPF and if it did then call route_output to get new route > 2) If the LWT destination is a host route then try to keep a dst > cache. This would entail checking destination address on return that > it is the same one as kept in the dst cache. Instead of building complex logic, we can allow the program to return a code to indicate when to perform another route lookup just as we do for the redirect case. Just because the destination address has changed may not require another lookup in all cases. A typical example would be a program rewriting addresses for the default route to other address which are always handled by the default route as well. An unconditional lookup would hurt performance in many cases.
Re: [PATCH for-next V2 00/15][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: Saeed MahameedDate: Sun, 30 Oct 2016 23:21:53 +0200 > This series contains some updates and fixes of mlx5 core and > IB drivers with the addition of two features that demand > new low level commands and infrastructure updates. > - SRIOV VF max rate limit support > - mlx5e tc support for FWD rules with counter. > > Needed for both net and rdma subsystems. Pulled, thanks.
[PATCH for-next V2 13/15] net/mlx5: Add multi dest support
From: Mark BlochCurrently when calling mlx5_add_flow_rule we accept only one flow destination, this commit allows to pass multiple destinations. This change forces us to change the return structure to a more flexible one. We introduce a flow handle (struct mlx5_flow_handle), it holds internally the number for rules created and holds an array where each cell points the to a flow rule. >From the consumers (of mlx5_add_flow_rule) point of view this change is only cosmetic and requires only to change the type of the returned value they store. >From the core point of view, we now need to use a loop when allocating and deleting rules (e.g given to us a flow handler). Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 14 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 38 +-- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 49 ++-- .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c| 19 +- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 32 +-- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 68 ++--- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 22 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 42 +-- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 289 ++--- drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 5 + include/linux/mlx5/fs.h| 28 +- 14 files changed, 374 insertions(+), 254 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index d02341e..8e0dbd5 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1771,13 +1771,13 @@ static int mlx5_ib_destroy_flow(struct ib_flow *flow_id) mutex_lock(>flow_db.lock); list_for_each_entry_safe(iter, tmp, >list, list) { - mlx5_del_flow_rule(iter->rule); + mlx5_del_flow_rules(iter->rule); put_flow_table(dev, iter->prio, true); list_del(>list); kfree(iter); } - mlx5_del_flow_rule(handler->rule); + mlx5_del_flow_rules(handler->rule); put_flow_table(dev, handler->prio, true); mutex_unlock(>flow_db.lock); @@ -1907,10 +1907,10 @@ static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev, spec->match_criteria_enable = get_match_criteria_enable(spec->match_criteria); action = dst ? MLX5_FLOW_CONTEXT_ACTION_FWD_DEST : MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO; - handler->rule = mlx5_add_flow_rule(ft, spec, + handler->rule = mlx5_add_flow_rules(ft, spec, action, MLX5_FS_DEFAULT_FLOW_TAG, - dst); + dst, 1); if (IS_ERR(handler->rule)) { err = PTR_ERR(handler->rule); @@ -1941,7 +1941,7 @@ static struct mlx5_ib_flow_handler *create_dont_trap_rule(struct mlx5_ib_dev *de handler_dst = create_flow_rule(dev, ft_prio, flow_attr, dst); if (IS_ERR(handler_dst)) { - mlx5_del_flow_rule(handler->rule); + mlx5_del_flow_rules(handler->rule); ft_prio->refcount--; kfree(handler); handler = handler_dst; @@ -2004,7 +2004,7 @@ static struct mlx5_ib_flow_handler *create_leftovers_rule(struct mlx5_ib_dev *de _specs[LEFTOVERS_UC].flow_attr, dst); if (IS_ERR(handler_ucast)) { - mlx5_del_flow_rule(handler->rule); + mlx5_del_flow_rules(handler->rule); ft_prio->refcount--; kfree(handler); handler = handler_ucast; @@ -2046,7 +2046,7 @@ static struct mlx5_ib_flow_handler *create_sniffer_rule(struct mlx5_ib_dev *dev, return handler_rx; err_tx: - mlx5_del_flow_rule(handler_rx->rule); + mlx5_del_flow_rules(handler_rx->rule); ft_rx->refcount--; kfree(handler_rx); err: diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index dcdcd19..d5d0077 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -153,7 +153,7 @@ struct mlx5_ib_flow_handler { struct list_headlist; struct ib_flow ibflow; struct
[PATCH for-next V2 06/15] net/mlx5: Introduce TSAR manipulation firmware commands
From: Mohamad Haj YahiaTSAR (stands for Transmit Scheduling ARbiter) is a hardware component that is responsible for selecting the next entity to serve on the transmit path. The arbitration defines the QoS policy between the agents connected to the TSAR. The TSAR is a consist two main features: 1) BW Allocation between agents: The TSAR implements a defecit weighted round robin between the agents. Each agent attached to the TSAR is assigned with a weight and it is awarded transmission tokens according to this weight. 2) Rate limer per agent: Each agent attached to the TSAR is (optionally) assigned with a rate limit. TSAR will not allow scheduling for an agent exceeding its defined rate limit. In this patch we implement the API of manipulating the TSAR. Signed-off-by: Mohamad Haj Yahia Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 13 +- .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 7 + drivers/net/ethernet/mellanox/mlx5/core/rl.c | 65 +++ include/linux/mlx5/mlx5_ifc.h | 199 - 4 files changed, 279 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 1e639f8..8561102 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -318,6 +318,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY: case MLX5_CMD_OP_SET_FLOW_TABLE_ROOT: case MLX5_CMD_OP_DEALLOC_ENCAP_HEADER: + case MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_DESTROY_QOS_PARA_VPORT: return MLX5_CMD_STAT_OK; case MLX5_CMD_OP_QUERY_HCA_CAP: @@ -419,11 +421,14 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_QUERY_FLOW_TABLE: case MLX5_CMD_OP_CREATE_FLOW_GROUP: case MLX5_CMD_OP_QUERY_FLOW_GROUP: - case MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY: case MLX5_CMD_OP_ALLOC_FLOW_COUNTER: case MLX5_CMD_OP_QUERY_FLOW_COUNTER: case MLX5_CMD_OP_ALLOC_ENCAP_HEADER: + case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_MODIFY_SCHEDULING_ELEMENT: + case MLX5_CMD_OP_CREATE_QOS_PARA_VPORT: *status = MLX5_DRIVER_STATUS_ABORTED; *synd = MLX5_DRIVER_SYND; return -EIO; @@ -580,6 +585,12 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(MODIFY_FLOW_TABLE); MLX5_COMMAND_STR_CASE(ALLOC_ENCAP_HEADER); MLX5_COMMAND_STR_CASE(DEALLOC_ENCAP_HEADER); + MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(QUERY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(MODIFY_SCHEDULING_ELEMENT); + MLX5_COMMAND_STR_CASE(CREATE_QOS_PARA_VPORT); + MLX5_COMMAND_STR_CASE(DESTROY_QOS_PARA_VPORT); default: return "unknown command opcode"; } } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 3d0cfb9..bf43171 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -91,6 +91,13 @@ int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs); bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev); int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id); int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id); +int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy, + void *context, u32 *element_id); +int mlx5_modify_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy, + void *context, u32 element_id, + u32 modify_bitmask); +int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy, + u32 element_id); int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev); cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev); u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c b/drivers/net/ethernet/mellanox/mlx5/core/rl.c index 104902a..e651e4c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c @@ -36,6 +36,71 @@ #include #include "mlx5_core.h" +/* Scheduling element fw management */ +int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy, + void *ctx, u32
[PATCH for-next V2 10/15] net/mlx5: Use fte status to decide on firmware command
From: Mark BlochAn fte status becomes FS_FTE_STATUS_EXISTING only after it was created in HW. We can use this in order to simplify the logic on what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING we need to create the fte, otherwise we need only to update it. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index a07ff30..e2bab9d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -946,7 +946,7 @@ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte *fte, BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_DESTINATION_LIST); } - if (fte->dests_size == 1 || !dest) + if (!(fte->status & FS_FTE_STATUS_EXISTING)) err = mlx5_cmd_create_fte(get_dev(>node), ft, fg->id, fte); else -- 2.7.4
[PATCH for-next V2 12/15] net/mlx5: Group similer rules under the same fte
From: Mark BlochWhen adding a new rule, if we can match it with compare_match_value and flow tag we might be able to insert the rule to the same fte. In order to do that, there must be an overlap between the actions of the fte and the new rule. When updating the action of an existing fte, we must tell the firmware we are doing so. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index fca6937..43d7052 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -920,7 +920,8 @@ static struct mlx5_flow_rule *alloc_rule(struct mlx5_flow_destination *dest) /* fte should not be deleted while calling this function */ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte *fte, struct mlx5_flow_group *fg, - struct mlx5_flow_destination *dest) + struct mlx5_flow_destination *dest, + bool update_action) { struct mlx5_flow_table *ft; struct mlx5_flow_rule *rule; @@ -931,6 +932,9 @@ static struct mlx5_flow_rule *add_rule_fte(struct fs_fte *fte, if (!rule) return ERR_PTR(-ENOMEM); + if (update_action) + modify_mask |= BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION); + fs_get_obj(ft, fg->node.parent); /* Add dest to dests list- we need flow tables to be in the * end of the list for forward to next prio rules. @@ -1109,7 +1113,9 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, fs_for_each_fte(fte, fg) { nested_lock_ref_node(>node, FS_MUTEX_CHILD); if (compare_match_value(>mask, match_value, >val) && - action == fte->action && flow_tag == fte->flow_tag) { + (action & fte->action) && flow_tag == fte->flow_tag) { + int old_action = fte->action; + rule = find_flow_rule(fte, dest); if (rule) { atomic_inc(>node.refcount); @@ -1117,11 +1123,15 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, unlock_ref_node(>node); return rule; } - rule = add_rule_fte(fte, fg, dest); - if (IS_ERR(rule)) + fte->action |= action; + rule = add_rule_fte(fte, fg, dest, + old_action != action); + if (IS_ERR(rule)) { + fte->action = old_action; goto unlock_fte; - else + } else { goto add_rule; + } } unlock_ref_node(>node); } @@ -1138,7 +1148,7 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, } tree_init_node(>node, 0, del_fte); nested_lock_ref_node(>node, FS_MUTEX_CHILD); - rule = add_rule_fte(fte, fg, dest); + rule = add_rule_fte(fte, fg, dest, false); if (IS_ERR(rule)) { kfree(fte); goto unlock_fg; -- 2.7.4
[PATCH for-next V2 14/15] net/mlx5: Add option to add fwd rule with counter
From: Mark BlochCurrently the code supports only drop rules to possess counters, add that ability also for fwd rules. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 24 +-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 6732287..0dfd998 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -374,6 +374,7 @@ static void del_rule(struct fs_node *node) struct mlx5_core_dev *dev = get_dev(node); int match_len = MLX5_ST_SZ_BYTES(fte_match_param); int err; + bool update_fte = false; match_value = mlx5_vzalloc(match_len); if (!match_value) { @@ -392,13 +393,23 @@ static void del_rule(struct fs_node *node) list_del(>next_ft); mutex_unlock(>dest_attr.ft->lock); } + + if (rule->dest_attr.type == MLX5_FLOW_DESTINATION_TYPE_COUNTER && + --fte->dests_size) { + modify_mask = BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION); + fte->action &= ~MLX5_FLOW_CONTEXT_ACTION_COUNT; + update_fte = true; + goto out; + } + if ((fte->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) && --fte->dests_size) { modify_mask = BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_DESTINATION_LIST), - err = mlx5_cmd_update_fte(dev, ft, - fg->id, - modify_mask, - fte); + update_fte = true; + } +out: + if (update_fte && fte->dests_size) { + err = mlx5_cmd_update_fte(dev, ft, fg->id, modify_mask, fte); if (err) mlx5_core_warn(dev, "%s can't del rule fg id=%d fte_index=%d\n", @@ -1287,8 +1298,9 @@ static bool counter_is_valid(struct mlx5_fc *counter, u32 action) if (!counter) return false; - /* Hardware support counter for a drop action only */ - return action == (MLX5_FLOW_CONTEXT_ACTION_DROP | MLX5_FLOW_CONTEXT_ACTION_COUNT); + return (action & (MLX5_FLOW_CONTEXT_ACTION_DROP | + MLX5_FLOW_CONTEXT_ACTION_FWD_DEST)) && + (action & MLX5_FLOW_CONTEXT_ACTION_COUNT); } static bool dest_is_valid(struct mlx5_flow_destination *dest, -- 2.7.4
[PATCH for-next V2 01/15] IB/mlx5: Skip handling unknown events
Do not dispatch unknown mlx5 core events on mlx5_ib_event. Signed-off-by: Saeed MahameedSigned-off-by: Eugenia Emantayev Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 2217477..d02341e 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -2358,6 +2358,8 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context, ibev.event = IB_EVENT_CLIENT_REREGISTER; port = (u8)param; break; + default: + return; } ibev.device = >ib_dev; -- 2.7.4
[PATCH for-next V2 00/15][PULL request] Mellanox mlx5 core driver updates 2016-10-25
Hi Dave and Doug, This series contains some updates and fixes of mlx5 core and IB drivers with the addition of two features that demand new low level commands and infrastructure updates. - SRIOV VF max rate limit support - mlx5e tc support for FWD rules with counter. Needed for both net and rdma subsystems. Updates and Fixes: >From Saeed Mahameed (2): - mlx5 IB: Skip handling unknown mlx5 events - Add ConnectX-5 PCIe 4.0 VF device ID >From Artemy Kovalyov (2): - Update struct mlx5_ifc_xrqc_bits - Ensure SRQ physical address structure endianness >From Eugenia Emantayev (1): - Fix length of async_event_mask New Features: >From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support - Introduce TSAR manipulation firmware commands - Introduce E-switch QoS management - Add SRIOV VF max rate configuration support >From Mark Bloch (7): mlx5e Tc support for FWD rule with counter - Don't unlock fte while still using it - Use fte status to decide on firmware command - Refactor find_flow_rule - Group similar rules under the same fte - Add multi dest support - Add option to add fwd rule with counter - mlx5e tc support for FWD rule with counter Mark here fixed two trivial issues with the flow steering core, and did some refactoring in the flow steering API to support adding mulit destination rules to the same hardware flow table entry at once. In the last two patches added the ability to populate a flow rule with a flow counter to the same flow entry. V2: Dropped some patches that added new structures without adding any usage of them. Added SRIOV VF max rate configuration support patch that introduces the usage of the TSAR infrastructure. Added flow steering fixes and refactoring in addition to mlx5 tc support for forward rule with counter. The following changes since commit a909d3e636995ba7c349e2ca5dbb528154d4ac30 Linux 4.9-rc3 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git tags/shared-for-4.10-1 for you to fetch changes up to e37a79e5d4cac3831fac3d4afbf2461f56b4b7bd net/mlx5e: Add tc support for FWD rule with counter Thanks, Saeed & Leon. Artemy Kovalyov (2): net/mlx5: Update struct mlx5_ifc_xrqc_bits net/mlx5: Ensure SRQ physical address structure endianness Eugenia Emantayev (1): net/mlx5: Fix length of async_event_mask Mark Bloch (7): net/mlx5: Don't unlock fte while still using it net/mlx5: Use fte status to decide on firmware command net/mlx5: Refactor find_flow_rule net/mlx5: Group similer rules under the same fte net/mlx5: Add multi dest support net/mlx5: Add option to add fwd rule with counter net/mlx5e: Add tc support for FWD rule with counter Mohamad Haj Yahia (3): net/mlx5: Introduce TSAR manipulation firmware commands net/mlx5: Introduce E-switch QoS management net/mlx5: Add SRIOV VF max rate configuration support Saeed Mahameed (2): IB/mlx5: Skip handling unknown events net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID drivers/infiniband/hw/mlx5/main.c | 16 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 13 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 38 +-- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 49 +-- .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c| 19 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 + drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 35 +- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 244 -- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 36 ++- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 60 ++-- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 358 +++-- drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 5 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 7 + drivers/net/ethernet/mellanox/mlx5/core/rl.c | 65 include/linux/mlx5/fs.h| 28 +- include/linux/mlx5/mlx5_ifc.h | 201 +++- include/linux/mlx5/srq.h | 2 +- 22 files changed, 927 insertions(+), 289 deletions(-) -- 2.7.4
Re: [PATCH net-next 0/7] qed*: Patch series
From: Yuval MintzDate: Sun, 30 Oct 2016 18:38:30 +0200 > Please consider applying this series to `net-next'. Even the first patch doesn't apply cleanly, please respin.
Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums
From: Mark LordDate: Wed, 26 Oct 2016 18:36:57 -0400 > Patch attached (to deal with buggy mailer) and also below for review. Please make your mailer work properly so that you can submit patches properly which work inline, just like every other developer does for the kernel. Also please format your Subject line properly, it must be of the form: [PATCH net] r8152: Fix broken RX checksums. The important parts are: 1) "[PATCH net]" This says that it is a patch, and that it is targetting the 'net' GIT tree specifically. 2) "r8152: " This indicates the "subsystem" that the patch specifically targets, in this case the r8152 driver. It must end with a colon character then a space. 3) "Fix broken RX checksums." Commit header lines and commit messages are proper English, therefore sentences should begin with a capitalized letter and end with a period. Thanks.
[PATCH for-next V2 11/15] net/mlx5: Refactor find_flow_rule
From: Mark BlochThe way we compare between two dests will need to be used in other places in the future, so we factor out the comparison logic between two dests into a separate function. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 29 --- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index e2bab9d..fca6937 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -153,6 +153,8 @@ static void del_rule(struct fs_node *node); static void del_flow_table(struct fs_node *node); static void del_flow_group(struct fs_node *node); static void del_fte(struct fs_node *node); +static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1, + struct mlx5_flow_destination *d2); static void tree_init_node(struct fs_node *node, unsigned int refcount, @@ -1064,21 +1066,30 @@ static struct mlx5_flow_group *create_autogroup(struct mlx5_flow_table *ft, return fg; } +static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1, + struct mlx5_flow_destination *d2) +{ + if (d1->type == d2->type) { + if ((d1->type == MLX5_FLOW_DESTINATION_TYPE_VPORT && +d1->vport_num == d2->vport_num) || + (d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE && +d1->ft == d2->ft) || + (d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR && +d1->tir_num == d2->tir_num)) + return true; + } + + return false; +} + static struct mlx5_flow_rule *find_flow_rule(struct fs_fte *fte, struct mlx5_flow_destination *dest) { struct mlx5_flow_rule *rule; list_for_each_entry(rule, >node.children, node.list) { - if (rule->dest_attr.type == dest->type) { - if ((dest->type == MLX5_FLOW_DESTINATION_TYPE_VPORT && -dest->vport_num == rule->dest_attr.vport_num) || - (dest->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE && -dest->ft == rule->dest_attr.ft) || - (dest->type == MLX5_FLOW_DESTINATION_TYPE_TIR && -dest->tir_num == rule->dest_attr.tir_num)) - return rule; - } + if (mlx5_flow_dests_cmp(>dest_attr, dest)) + return rule; } return NULL; } -- 2.7.4
[PATCH for-next V2 08/15] net/mlx5: Add SRIOV VF max rate configuration support
From: Mohamad Haj YahiaImplement the vf set rate ndo by modifying the TSAR vport rate limit. Signed-off-by: Mohamad Haj Yahia Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++ drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 63 +++ drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 2 + 3 files changed, 80 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 7eaf380..7f763d2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2945,6 +2945,20 @@ static int mlx5e_set_vf_trust(struct net_device *dev, int vf, bool setting) return mlx5_eswitch_set_vport_trust(mdev->priv.eswitch, vf + 1, setting); } + +static int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate, +int max_tx_rate) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5_core_dev *mdev = priv->mdev; + + if (min_tx_rate) + return -EOPNOTSUPP; + + return mlx5_eswitch_set_vport_rate(mdev->priv.eswitch, vf + 1, + max_tx_rate); +} + static int mlx5_vport_link2ifla(u8 esw_link) { switch (esw_link) { @@ -3252,6 +3266,7 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = { .ndo_set_vf_vlan = mlx5e_set_vf_vlan, .ndo_set_vf_spoofchk = mlx5e_set_vf_spoofchk, .ndo_set_vf_trust= mlx5e_set_vf_trust, + .ndo_set_vf_rate = mlx5e_set_vf_rate, .ndo_get_vf_config = mlx5e_get_vf_config, .ndo_set_vf_link_state = mlx5e_set_vf_link_state, .ndo_get_vf_stats= mlx5e_get_vf_stats, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 2e11a94..9ef01d1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1451,6 +1451,47 @@ static void esw_vport_disable_qos(struct mlx5_eswitch *esw, int vport_num) vport->qos.enabled = false; } +static int esw_vport_qos_config(struct mlx5_eswitch *esw, int vport_num, + u32 max_rate) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0}; + struct mlx5_vport *vport = >vports[vport_num]; + struct mlx5_core_dev *dev = esw->dev; + void *vport_elem; + u32 bitmask = 0; + int err = 0; + + if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling)) + return -EOPNOTSUPP; + + if (!vport->qos.enabled) + return -EIO; + + MLX5_SET(scheduling_context, _ctx, element_type, +SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT); + vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx, + element_attributes); + MLX5_SET(vport_element, vport_elem, vport_number, vport_num); + MLX5_SET(scheduling_context, _ctx, parent_element_id, +esw->qos.root_tsar_id); + MLX5_SET(scheduling_context, _ctx, max_average_bw, +max_rate); + bitmask |= MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_MAX_AVERAGE_BW; + + err = mlx5_modify_scheduling_element_cmd(dev, +SCHEDULING_HIERARCHY_E_SWITCH, +_ctx, +vport->qos.esw_tsar_ix, +bitmask); + if (err) { + esw_warn(esw->dev, "E-Switch modify TSAR vport element failed (vport=%d,err=%d)\n", +vport_num, err); + return err; + } + + return 0; +} + static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN]) { ((u8 *)node_guid)[7] = mac[0]; @@ -1888,6 +1929,7 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw, ivi->qos = evport->info.qos; ivi->spoofchk = evport->info.spoofchk; ivi->trusted = evport->info.trusted; + ivi->max_tx_rate = evport->info.max_rate; mutex_unlock(>state_lock); return 0; @@ -1981,6 +2023,27 @@ int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch *esw, return 0; } +int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, + int vport, u32 max_rate) +{ + struct mlx5_vport *evport; + int err = 0; + + if (!ESW_ALLOWED(esw)) + return -EPERM; + if (!LEGAL_VPORT(esw, vport)) + return -EINVAL; + + mutex_lock(>state_lock); + evport = >vports[vport]; + err = esw_vport_qos_config(esw, vport, max_rate); + if (!err) +
[PATCH for-next V2 04/15] net/mlx5: Fix length of async_event_mask
From: Eugenia EmantayevAccording to PRM async_event_mask have to be 64 bits long. Signed-off-by: Eugenia Emantayev Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index aaca090..e74a73b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -469,7 +469,7 @@ void mlx5_eq_cleanup(struct mlx5_core_dev *dev) int mlx5_start_eqs(struct mlx5_core_dev *dev) { struct mlx5_eq_table *table = >priv.eq_table; - u32 async_event_mask = MLX5_ASYNC_EVENT_MASK; + u64 async_event_mask = MLX5_ASYNC_EVENT_MASK; int err; if (MLX5_CAP_GEN(dev, pg)) -- 2.7.4
[PATCH for-next V2 02/15] net/mlx5: Update struct mlx5_ifc_xrqc_bits
From: Artemy KovalyovUpdate struct mlx5_ifc_xrqc_bits according to last specification Signed-off-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 6045d4d..12f72e4 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -2844,7 +2844,7 @@ struct mlx5_ifc_xrqc_bits { struct mlx5_ifc_tag_matching_topology_context_bits tag_matching_topology_context; - u8 reserved_at_180[0x200]; + u8 reserved_at_180[0x880]; struct mlx5_ifc_wq_bits wq; }; -- 2.7.4
Re: [patch net-next 00/16] mlxsw: Add Infiniband support for Mellanox switches
Sun, Oct 30, 2016 at 09:51:07PM CET, da...@davemloft.net wrote: >From: Jiri Pirko>Date: Fri, 28 Oct 2016 21:17:34 +0200 > >> This patchset adds basic Infiniband support for SwitchX-2, Switch-IB >> and Switch-IB-2 ASIC drivers. > >This depended upon the bug fixes which were only in 'net' until a few >hours ago. > >Please state this explicitly in the future, it'll save me time. Apologies. Will be more careful with this next time. Thanks. > >Series applied, thanks.
Re: [PATCH] net: bonding: use new api ethtool_{get|set}_link_ksettings
From: Philippe ReynesDate: Tue, 25 Oct 2016 18:41:31 +0200 > The ethtool api {get|set}_settings is deprecated. > We move this driver to new api {get|set}_link_ksettings. > > Signed-off-by: Philippe Reynes Applied, thanks.
[PATCH for-next V2 05/15] net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID
For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the device ID "0x101a" to mlx5_core_pci_table. Signed-off-by: Saeed MahameedSigned-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index d9c3c70..197e04c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1422,6 +1422,7 @@ static const struct pci_device_id mlx5_core_pci_table[] = { { PCI_VDEVICE(MELLANOX, 0x1017) }, /* ConnectX-5, PCIe 3.0 */ { PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF}, /* ConnectX-5 VF */ { PCI_VDEVICE(MELLANOX, 0x1019) }, /* ConnectX-5, PCIe 4.0 */ + { PCI_VDEVICE(MELLANOX, 0x101a), MLX5_PCI_DEV_IS_VF}, /* ConnectX-5, PCIe 4.0 VF */ { 0, } }; -- 2.7.4
[PATCH for-next V2 09/15] net/mlx5: Don't unlock fte while still using it
From: Mark BlochWhen adding a new rule to an fte, we need to hold the fte lock until we add that rule to the fte and increase the fte ref count. Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API") Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 5da2cc8..a07ff30 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1107,9 +1107,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, return rule; } rule = add_rule_fte(fte, fg, dest); - unlock_ref_node(>node); if (IS_ERR(rule)) - goto unlock_fg; + goto unlock_fte; else goto add_rule; } @@ -1127,6 +1126,7 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, goto unlock_fg; } tree_init_node(>node, 0, del_fte); + nested_lock_ref_node(>node, FS_MUTEX_CHILD); rule = add_rule_fte(fte, fg, dest); if (IS_ERR(rule)) { kfree(fte); @@ -1139,6 +1139,8 @@ static struct mlx5_flow_rule *add_rule_fg(struct mlx5_flow_group *fg, list_add(>node.list, prev); add_rule: tree_add_node(>node, >node); +unlock_fte: + unlock_ref_node(>node); unlock_fg: unlock_ref_node(>node); return rule; -- 2.7.4
[PATCH for-next V2 03/15] net/mlx5: Ensure SRQ physical address structure endianness
From: Artemy KovalyovSRQ physical address structure field should be in big-endian format. Signed-off-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/srq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mlx5/srq.h b/include/linux/mlx5/srq.h index 33c97dc..1cde0fd 100644 --- a/include/linux/mlx5/srq.h +++ b/include/linux/mlx5/srq.h @@ -55,7 +55,7 @@ struct mlx5_srq_attr { u32 lwm; u32 user_index; u64 db_record; - u64 *pas; + __be64 *pas; }; struct mlx5_core_dev; -- 2.7.4
[PATCH for-next V2 15/15] net/mlx5e: Add tc support for FWD rule with counter
From: Mark BlochWhen creating a FWD rule using tc create also a HW counter for this rule. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 ++- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 20 +++- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 5d9ac0d..c2e4728 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -419,7 +419,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts, return -EINVAL; } - attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | + MLX5_FLOW_CONTEXT_ACTION_COUNT; out_priv = netdev_priv(out_dev); attr->out_rep = out_priv->ppriv; continue; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 8b2a383..53d9d6c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -48,11 +48,12 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, struct mlx5_flow_spec *spec, struct mlx5_esw_flow_attr *attr) { - struct mlx5_flow_destination dest = { 0 }; + struct mlx5_flow_destination dest[2] = {}; struct mlx5_fc *counter = NULL; struct mlx5_flow_handle *rule; void *misc; int action; + int i = 0; if (esw->mode != SRIOV_OFFLOADS) return ERR_PTR(-EOPNOTSUPP); @@ -60,15 +61,17 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, action = attr->action; if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { - dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT; - dest.vport_num = attr->out_rep->vport; - action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; - } else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { + dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT; + dest[i].vport_num = attr->out_rep->vport; + i++; + } + if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { counter = mlx5_fc_create(esw->dev, true); if (IS_ERR(counter)) return ERR_CAST(counter); - dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; - dest.counter = counter; + dest[i].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; + dest[i].counter = counter; + i++; } misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters); @@ -81,8 +84,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, MLX5_MATCH_MISC_PARAMETERS; rule = mlx5_add_flow_rules((struct mlx5_flow_table *)esw->fdb_table.fdb, - spec, action, 0, , 1); - + spec, action, 0, dest, i); if (IS_ERR(rule)) mlx5_fc_destroy(esw->dev, counter); -- 2.7.4
[PATCH for-next V2 07/15] net/mlx5: Introduce E-switch QoS management
From: Mohamad Haj YahiaAdd TSAR to the eswitch which will act as the vports rate limiter. Create/Destroy TSAR on Enable/Dsiable SRIOV. Attach/Detach vport to eswitch TSAR on Enable/Disable vport. Signed-off-by: Mohamad Haj Yahia Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 113 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 12 +++ 2 files changed, 124 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index abbf2c3..2e11a94 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1351,6 +1351,106 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw, return err; } +/* Vport QoS management */ +static int esw_create_tsar(struct mlx5_eswitch *esw) +{ + u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0}; + struct mlx5_core_dev *dev = esw->dev; + int err; + + if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling)) + return 0; + + if (esw->qos.enabled) + return -EEXIST; + + err = mlx5_create_scheduling_element_cmd(dev, +SCHEDULING_HIERARCHY_E_SWITCH, +_ctx, +>qos.root_tsar_id); + if (err) { + esw_warn(esw->dev, "E-Switch create TSAR failed (%d)\n", err); + return err; + } + + esw->qos.enabled = true; + return 0; +} + +static void esw_destroy_tsar(struct mlx5_eswitch *esw) +{ + int err; + + if (!esw->qos.enabled) + return; + + err = mlx5_destroy_scheduling_element_cmd(esw->dev, + SCHEDULING_HIERARCHY_E_SWITCH, + esw->qos.root_tsar_id); + if (err) + esw_warn(esw->dev, "E-Switch destroy TSAR failed (%d)\n", err); + + esw->qos.enabled = false; +} + +static int esw_vport_enable_qos(struct mlx5_eswitch *esw, int vport_num, + u32 initial_max_rate) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0}; + struct mlx5_vport *vport = >vports[vport_num]; + struct mlx5_core_dev *dev = esw->dev; + void *vport_elem; + int err = 0; + + if (!esw->qos.enabled || !MLX5_CAP_GEN(dev, qos) || + !MLX5_CAP_QOS(dev, esw_scheduling)) + return 0; + + if (vport->qos.enabled) + return -EEXIST; + + MLX5_SET(scheduling_context, _ctx, element_type, +SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT); + vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx, + element_attributes); + MLX5_SET(vport_element, vport_elem, vport_number, vport_num); + MLX5_SET(scheduling_context, _ctx, parent_element_id, +esw->qos.root_tsar_id); + MLX5_SET(scheduling_context, _ctx, max_average_bw, +initial_max_rate); + + err = mlx5_create_scheduling_element_cmd(dev, +SCHEDULING_HIERARCHY_E_SWITCH, +_ctx, +>qos.esw_tsar_ix); + if (err) { + esw_warn(esw->dev, "E-Switch create TSAR vport element failed (vport=%d,err=%d)\n", +vport_num, err); + return err; + } + + vport->qos.enabled = true; + return 0; +} + +static void esw_vport_disable_qos(struct mlx5_eswitch *esw, int vport_num) +{ + struct mlx5_vport *vport = >vports[vport_num]; + int err = 0; + + if (!vport->qos.enabled) + return; + + err = mlx5_destroy_scheduling_element_cmd(esw->dev, + SCHEDULING_HIERARCHY_E_SWITCH, + vport->qos.esw_tsar_ix); + if (err) + esw_warn(esw->dev, "E-Switch destroy TSAR vport element failed (vport=%d,err=%d)\n", +vport_num, err); + + vport->qos.enabled = false; +} + static void node_guid_gen_from_mac(u64 *node_guid, u8 mac[ETH_ALEN]) { ((u8 *)node_guid)[7] = mac[0]; @@ -1386,6 +1486,7 @@ static void esw_apply_vport_conf(struct mlx5_eswitch *esw, esw_vport_egress_config(esw, vport); } } + static void esw_enable_vport(struct mlx5_eswitch *esw, int vport_num, int enable_events) { @@ -1399,6 +1500,10 @@ static void esw_enable_vport(struct mlx5_eswitch *esw, int vport_num, /* Restore old vport configuration */
Re: Let's do P4
On 16-10-30 12:56 PM, Jiri Pirko wrote: > Sun, Oct 30, 2016 at 07:44:43PM CET, kubak...@wp.pl wrote: >> On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote: >>> Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote: On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote: > Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: >>> [...] >>> [...] > [...] > [...] > [...] > [...] >>> [...] > > Agreed. Just to clarify my intention here was not to suggest the use of eBPF as the IR. I was merely cautioning against bundling the new API with P4, for multiple reasons. As John mentioned P4 spec was evolving in the past. The spec is designed for HW more capable than the switch ASICs we have today. As vendors move to provide more configurability we may need to extend the API beyond P4. We may want to extend this API to for SW hand-offs (as suggested by Thomas) which are not part of P4 spec. Also John showed examples of matchd software which already uses P4 at the frontend today and translates it to different targets (eBPF, u32, HW). It may just be about the naming but I feel like calling the new API more generically, switch AST or some such may help to avoid unnecessary ties and confusion. >>> >>> Well, that basically means to create "something" that could be be used >>> to translate p4 source to. Not sure how exactly this "something" should >>> look like and how different would it be from p4. I thought it might >>> be good to benefit from the p4 definition and use it directly. Not sure. >> >> We have to translate the P4 into "something" already, that something >> is the AST we will load into the kernel. Or were you planning to use >> some official P4 AST? I'm not suggesting we add our own high level > > I'm not aware of existence of some official P4 AST. We have to figure it > out. > The compilers at p4.org have an AST so you could claim those are in some sense "official". Also given the BNF published in the p4 spec lends itself to what the AST should look like. Also FWIW the AST is not necessarily the same as the IR. > >> language. I agree that P4 is a good starting point, and perhaps a good >> high level language. I'm just cautious of creating an equivalency >> between high level language (P4) and the kernel ABI. > > Understood. Definitelly good to be very cautious when defining a kernel > API. > > And another point that came up (trying to unify threads a bit) "I wonder why p4 does not handle the HW capabilities. At least I did not find it. It would be certainly nice to have it." One of the points of P4 is that the hardware should be configurable. So given a P4 definition of a parse graph, table layout, etc. the hardware should configure itself to support that "program". The reason you don't see any HW capabilities is because the "program" is exactly what the hardware is expected to run. Also the P4 spec does not provide a definition or a "runtime" API. This will at some point be defined in another spec. So a clarifying point are you expecting hardware to reconfigure itself to match the P4 program or are you simply using this to configure TCAM slices and building a runtime API. For example if a P4 program gives a new parse graph that is not supported by the hardware should it be rejected. From the flow-api you will see a handful of get_* operations but no set_* operations. Because the set_* path has to come down to the hardware in ucode/low-level firmware updates. Its unlikely that vendors will want to expose ucode/etc. The set_flow/get_flow bits could be mapped onto a cls_p4 or a cls_switch as I think was hinted above. Thanks, John
Re: Let's do P4
[...] > > Yeah, I was also thinking about something similar to your Flow-API, > but we need something more generic I believe. I've heard this in a couple other forums as well but please elaborate exactly what needs to be more generic? That API is sufficient to both express the init time piece of the original P4 draft and the runtime component. I guess we are trying to strike a balance here between the ability to actually write an IR that a sufficiently large subset of hardware can support "easily" and something that can support all possible hardware features. IMO this leads to something like the Flow-API in the first case or to something like eBPF for all possible features. > > >> >> We also have an emulated path also auto-generated from compiler tools >> that creates eBPF code from the IR so this would give you the software >> fall-back. > > > Btw, Flow-API was rejected because it was a clean kernel-bypass. In case > of p4, if we do what Thomas is suggesting, having x.bpf for SW and > x.p4ast for HW, that would be the very same kernel-bypass. Therefore I > strongly believe there should be a single kernel API for p4 SW+HW - for > both p4 program insertion and runtime configuration. Another area of push-back came from creating yet another infrastructure.
Re: [patch net-next 00/16] mlxsw: Add Infiniband support for Mellanox switches
From: Jiri PirkoDate: Fri, 28 Oct 2016 21:17:34 +0200 > This patchset adds basic Infiniband support for SwitchX-2, Switch-IB > and Switch-IB-2 ASIC drivers. This depended upon the bug fixes which were only in 'net' until a few hours ago. Please state this explicitly in the future, it'll save me time. Series applied, thanks.
Re: [RFC PATCH 01/13] pinctrl: meson: Add GXL pinctrl definitions
On Fri, Oct 21, 2016 at 04:40:26PM +0200, Neil Armstrong wrote: > Add support for the Amlogic Meson GXL SoC, this is a partially complete > definition only based on the Amlogic Vendor tree. > > This definition differs a lot from the GXBB and needs a separate entry. > > Signed-off-by: Neil Armstrong> --- > .../devicetree/bindings/pinctrl/meson,pinctrl.txt | 2 + Acked-by: Rob Herring > drivers/pinctrl/meson/Makefile | 3 +- > drivers/pinctrl/meson/pinctrl-meson-gxl.c | 589 > + > drivers/pinctrl/meson/pinctrl-meson.c | 8 + > drivers/pinctrl/meson/pinctrl-meson.h | 2 + > include/dt-bindings/gpio/meson-gxl-gpio.h | 131 + > 6 files changed, 734 insertions(+), 1 deletion(-) > create mode 100644 drivers/pinctrl/meson/pinctrl-meson-gxl.c > create mode 100644 include/dt-bindings/gpio/meson-gxl-gpio.h
Re: [PATCH] net: stmmac: Add OXNAS Glue Driver
On Fri, Oct 21, 2016 at 10:44:45AM +0200, Neil Armstrong wrote: > Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820. > > Signed-off-by: Neil Armstrong> --- > .../devicetree/bindings/net/oxnas-dwmac.txt| 44 + It's preferred that bindings are a separate patch. > drivers/net/ethernet/stmicro/stmmac/Kconfig| 11 ++ > drivers/net/ethernet/stmicro/stmmac/Makefile | 1 + > drivers/net/ethernet/stmicro/stmmac/dwmac-oxnas.c | 219 > + > 4 files changed, 275 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/oxnas-dwmac.txt > create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-oxnas.c > > Changes since RFC at https://patchwork.kernel.org/patch/9387257 : > - Drop init/exit callbacks > - Implement proper remove and PM callback > - Call init from probe > - Disable/Unprepare clock if stmmac probe fails > > diff --git a/Documentation/devicetree/bindings/net/oxnas-dwmac.txt > b/Documentation/devicetree/bindings/net/oxnas-dwmac.txt > new file mode 100644 > index 000..5d2696c > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/oxnas-dwmac.txt > @@ -0,0 +1,44 @@ > +* Oxford Semiconductor OXNAS DWMAC Ethernet controller > + > +The device inherits all the properties of the dwmac/stmmac devices > +described in the file stmmac.txt in the current directory with the > +following changes. > + > +Required properties on all platforms: > + > +- compatible:Depending on the platform this should be one of: > + - "oxsemi,ox820-dwmac" > + Additionally "snps,dwmac" and any applicable more > + detailed version number described in net/stmmac.txt > + should be used. You should be explicit what version applies to ox820. "snps,dwmac" should probably be deprecated IMO. There are so many variations of DW h/w. > + > +- reg: The first register range should be the one of the DWMAC > + controller. This is worded like there's a 2nd range? > + > +- clocks: Should contain phandles to the following clocks > +- clock-names: Should contain the following: > + - "stmmaceth" - see stmmac.txt > + - "gmac" - peripheral gate clock > + > +- oxsemi,sys-ctrl: a phandle to the system controller syscon node > + > +Example : > + > +etha: ethernet@4040 { > + compatible = "oxsemi,ox820-dwmac", "snps,dwmac"; > + reg = <0x4040 0x2000>; > + interrupts = , > + ; > + interrupt-names = "macirq", "eth_wake_irq"; > + mac-address = []; /* Filled in by U-Boot */ > + phy-mode = "rgmii"; > + > + clocks = < CLK_820_ETHA>, <>; > + clock-names = "gmac", "stmmaceth"; > + resets = < RESET_MAC>; > + > + /* Regmap for sys registers */ > + oxsemi,sys-ctrl = <>; > + > + status = "disabled"; > +};
Re: [PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation
On Sun, Oct 30, 2016 at 4:58 AM, Thomas Grafwrote: > Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and > BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a > LWT redirection of type LWTUNNEL_ENCAP_BPF. > > The separate program types are required because manipulation of > packet data is only allowed on the output and transmit path as > the subsequent dst_input() call path assumes an IP header > validated by ip_rcv(). The BPF programs will be handed an skb > with the L3 header attached and may return one of the following > return codes: > > BPF_OK - Continue routing as per nexthop > BPF_DROP - Drop skb and return EPERM > BPF_REDIRECT - Redirect skb to device as per redirect() helper. > (Only valid on lwtunnel_xmit() hook) > > The return codes are binary compatible with their TC_ACT_ > relatives to ease compatibility. > > A new helper bpf_skb_push() is added which allows to preprend an > L2 header in front of the skb, extend the existing L3 header, or > both. This allows to address a wide range of issues: > - Optimize L2 header construction when L2 information is always >static to avoid ARP/NDisc lookup. > - Extend IP header to add additional IP options. > - Perform simple encapsulation where offload is of no concern. >(The existing funtionality to attach a tunnel key to the skb > and redirect to a tunnel net_device to allow for offload > continues to work obviously). > > Signed-off-by: Thomas Graf > --- > include/linux/filter.h| 2 +- > include/uapi/linux/bpf.h | 31 +++- > include/uapi/linux/lwtunnel.h | 21 +++ > kernel/bpf/verifier.c | 16 +- > net/core/Makefile | 2 +- > net/core/filter.c | 148 - > net/core/lwt_bpf.c| 365 > ++ > net/core/lwtunnel.c | 1 + > 8 files changed, 579 insertions(+), 7 deletions(-) > create mode 100644 net/core/lwt_bpf.c > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 1f09c52..aad7f81 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -438,7 +438,7 @@ struct xdp_buff { > }; > > /* compute the linear packet data range [data, data_end) which > - * will be accessed by cls_bpf and act_bpf programs > + * will be accessed by cls_bpf, act_bpf and lwt programs > */ > static inline void bpf_compute_data_end(struct sk_buff *skb) > { > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index e2f38e0..2ebaa3c 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -96,6 +96,9 @@ enum bpf_prog_type { > BPF_PROG_TYPE_TRACEPOINT, > BPF_PROG_TYPE_XDP, > BPF_PROG_TYPE_PERF_EVENT, > + BPF_PROG_TYPE_LWT_IN, > + BPF_PROG_TYPE_LWT_OUT, > + BPF_PROG_TYPE_LWT_XMIT, > }; > > #define BPF_PSEUDO_MAP_FD 1 > @@ -383,6 +386,16 @@ union bpf_attr { > * > * int bpf_get_numa_node_id() > * Return: Id of current NUMA node. > + * > + * int bpf_skb_push() > + * Add room to beginning of skb and adjusts MAC header offset > accordingly. > + * Extends/reallocaes for needed skb headeroom automatically. > + * May change skb data pointer and will thus invalidate any check done > + * for direct packet access. > + * @skb: pointer to skb > + * @len: length of header to be pushed in front > + * @flags: Flags (unused for now) > + * Return: 0 on success or negative error > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -427,7 +440,8 @@ union bpf_attr { > FN(skb_pull_data), \ > FN(csum_update),\ > FN(set_hash_invalid), \ > - FN(get_numa_node_id), > + FN(get_numa_node_id), \ > + FN(skb_push), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > @@ -511,6 +525,21 @@ struct bpf_tunnel_key { > __u32 tunnel_label; > }; > > +/* Generic BPF return codes which all BPF program types may support. > + * The values are binary compatible with their TC_ACT_* counter-part to > + * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT > + * programs. > + * > + * XDP is handled seprately, see XDP_*. > + */ > +enum bpf_ret_code { > + BPF_OK = 0, > + /* 1 reserved */ > + BPF_DROP = 2, > + /* 3-6 reserved */ > + BPF_REDIRECT = 7, > +}; > + > /* User return codes for XDP prog type. > * A valid XDP program must return one of these defined values. All other > * return codes are reserved for future use. Unknown return codes will result > diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h > index a478fe8..9354d997 100644 > --- a/include/uapi/linux/lwtunnel.h > +++ b/include/uapi/linux/lwtunnel.h > @@ -9,6 +9,7 @@ enum lwtunnel_encap_types { >
[net-next PATCH 5/7] stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
Add clock error handling to probe and in the process move clock enabling out of sti_dwmac_init() to make this easier. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index e814b68..aa75a27 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -237,8 +237,6 @@ static int sti_dwmac_init(struct platform_device *pdev, void *priv) u32 reg = dwmac->ctrl_reg; u32 val; - clk_prepare_enable(dwmac->clk); - if (dwmac->gmac_en) regmap_update_bits(regmap, reg, EN_MASK, EN); @@ -348,11 +346,23 @@ static int sti_dwmac_probe(struct platform_device *pdev) plat_dat->bsp_priv = dwmac; plat_dat->fix_mac_speed = data->fix_retime_src; - ret = sti_dwmac_init(pdev, plat_dat->bsp_priv); + ret = clk_prepare_enable(dwmac->clk); if (ret) return ret; - return stmmac_dvr_probe(>dev, plat_dat, _res); + ret = sti_dwmac_init(pdev, plat_dat->bsp_priv); + if (ret) + goto disable_clk; + + ret = stmmac_dvr_probe(>dev, plat_dat, _res); + if (ret) + goto disable_clk; + + return 0; + +disable_clk: + clk_disable_unprepare(dwmac->clk); + return ret; } static int sti_dwmac_remove(struct platform_device *pdev) @@ -381,6 +391,7 @@ static int sti_dwmac_resume(struct device *dev) struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev); struct platform_device *pdev = to_platform_device(dev); + clk_prepare_enable(dwmac->clk); sti_dwmac_init(pdev, dwmac); return stmmac_resume(dev); -- 2.10.1
[net-next PATCH 7/7] stmmac: dwmac-sti: remove unused priv dev member
The dev member of struct sti_dwmac is not used anywhere in the driver so lets just remove it. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index 0af3faa..f51fb16 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -126,7 +126,6 @@ struct sti_dwmac { struct clk *clk;/* PHY clock */ u32 ctrl_reg; /* GMAC glue-logic control register */ int clk_sel_reg;/* GMAC ext clk selection register */ - struct device *dev; struct regmap *regmap; bool gmac_en; u32 speed; @@ -274,7 +273,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac, return err; } - dwmac->dev = dev; dwmac->interface = of_get_phy_mode(np); dwmac->regmap = regmap; dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en"); -- 2.10.1
[net-next PATCH 4/7] stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data
The sti_dwmac_init() function is called both from probe and resume. Since DT properties doesn't change between suspend/resume cycles move parsing of this parameter into sti_dwmac_parse_data() where it belongs. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index 09dd2be..e814b68 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -128,6 +128,7 @@ struct sti_dwmac { int clk_sel_reg;/* GMAC ext clk selection register */ struct device *dev; struct regmap *regmap; + bool gmac_en; u32 speed; void (*fix_retime_src)(void *priv, unsigned int speed); }; @@ -233,14 +234,12 @@ static int sti_dwmac_init(struct platform_device *pdev, void *priv) struct sti_dwmac *dwmac = priv; struct regmap *regmap = dwmac->regmap; int iface = dwmac->interface; - struct device *dev = dwmac->dev; - struct device_node *np = dev->of_node; u32 reg = dwmac->ctrl_reg; u32 val; clk_prepare_enable(dwmac->clk); - if (of_property_read_bool(np, "st,gmac_en")) + if (dwmac->gmac_en) regmap_update_bits(regmap, reg, EN_MASK, EN); regmap_update_bits(regmap, reg, MII_PHY_SEL_MASK, phy_intf_sels[iface]); @@ -281,6 +280,7 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac, dwmac->dev = dev; dwmac->interface = of_get_phy_mode(np); dwmac->regmap = regmap; + dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en"); dwmac->ext_phyclk = of_property_read_bool(np, "st,ext-phyclk"); dwmac->tx_retime_src = TX_RETIME_SRC_NA; dwmac->speed = SPEED_100; -- 2.10.1
[net-next PATCH 6/7] stmmac: dwmac-sti: clean up and rename sti_dwmac_init
Rename sti_dwmac_init to sti_dwmac_set_phy_mode which is a better description for what it really does. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index aa75a27..0af3faa 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -229,9 +229,8 @@ static void stid127_fix_retime_src(void *priv, u32 spd) regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val); } -static int sti_dwmac_init(struct platform_device *pdev, void *priv) +static int sti_dwmac_set_phy_mode(struct sti_dwmac *dwmac) { - struct sti_dwmac *dwmac = priv; struct regmap *regmap = dwmac->regmap; int iface = dwmac->interface; u32 reg = dwmac->ctrl_reg; @@ -245,7 +244,7 @@ static int sti_dwmac_init(struct platform_device *pdev, void *priv) val = (iface == PHY_INTERFACE_MODE_REVMII) ? 0 : ENMII; regmap_update_bits(regmap, reg, ENMII_MASK, val); - dwmac->fix_retime_src(priv, dwmac->speed); + dwmac->fix_retime_src(dwmac, dwmac->speed); return 0; } @@ -350,7 +349,7 @@ static int sti_dwmac_probe(struct platform_device *pdev) if (ret) return ret; - ret = sti_dwmac_init(pdev, plat_dat->bsp_priv); + ret = sti_dwmac_set_phy_mode(dwmac); if (ret) goto disable_clk; @@ -389,10 +388,9 @@ static int sti_dwmac_suspend(struct device *dev) static int sti_dwmac_resume(struct device *dev) { struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev); - struct platform_device *pdev = to_platform_device(dev); clk_prepare_enable(dwmac->clk); - sti_dwmac_init(pdev, dwmac); + sti_dwmac_set_phy_mode(dwmac); return stmmac_resume(dev); } -- 2.10.1
[net-next PATCH 3/7] stmmac: dwmac-sti: add PM ops and resume function
Implement PM callbacks and driver remove in the driver instead of relying on the init/exit hooks in stmmac_platform. This gives the driver more flexibility in how the code is organized. Eventually the init/exit callbacks will be deprecated in favor of the standard PM callbacks and driver remove function. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 46 +++-- 1 file changed, 36 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index f009bf4..09dd2be 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -253,12 +253,6 @@ static int sti_dwmac_init(struct platform_device *pdev, void *priv) return 0; } -static void sti_dwmac_exit(struct platform_device *pdev, void *priv) -{ - struct sti_dwmac *dwmac = priv; - - clk_disable_unprepare(dwmac->clk); -} static int sti_dwmac_parse_data(struct sti_dwmac *dwmac, struct platform_device *pdev) { @@ -352,8 +346,6 @@ static int sti_dwmac_probe(struct platform_device *pdev) dwmac->fix_retime_src = data->fix_retime_src; plat_dat->bsp_priv = dwmac; - plat_dat->init = sti_dwmac_init; - plat_dat->exit = sti_dwmac_exit; plat_dat->fix_mac_speed = data->fix_retime_src; ret = sti_dwmac_init(pdev, plat_dat->bsp_priv); @@ -363,6 +355,40 @@ static int sti_dwmac_probe(struct platform_device *pdev) return stmmac_dvr_probe(>dev, plat_dat, _res); } +static int sti_dwmac_remove(struct platform_device *pdev) +{ + struct sti_dwmac *dwmac = get_stmmac_bsp_priv(>dev); + int ret = stmmac_dvr_remove(>dev); + + clk_disable_unprepare(dwmac->clk); + + return ret; +} + +#ifdef CONFIG_PM_SLEEP +static int sti_dwmac_suspend(struct device *dev) +{ + struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev); + int ret = stmmac_suspend(dev); + + clk_disable_unprepare(dwmac->clk); + + return ret; +} + +static int sti_dwmac_resume(struct device *dev) +{ + struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev); + struct platform_device *pdev = to_platform_device(dev); + + sti_dwmac_init(pdev, dwmac); + + return stmmac_resume(dev); +} +#endif /* CONFIG_PM_SLEEP */ + +SIMPLE_DEV_PM_OPS(sti_dwmac_pm_ops, sti_dwmac_suspend, sti_dwmac_resume); + static const struct sti_dwmac_of_data stih4xx_dwmac_data = { .fix_retime_src = stih4xx_fix_retime_src, }; @@ -382,10 +408,10 @@ MODULE_DEVICE_TABLE(of, sti_dwmac_match); static struct platform_driver sti_dwmac_driver = { .probe = sti_dwmac_probe, - .remove = stmmac_pltfr_remove, + .remove = sti_dwmac_remove, .driver = { .name = "sti-dwmac", - .pm = _pltfr_pm_ops, + .pm = _dwmac_pm_ops, .of_match_table = sti_dwmac_match, }, }; -- 2.10.1
[net-next PATCH 2/7] stmmac: dwmac-sti: remove clk NULL checks
Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was provided in DT and NULL is a valid clock there is no need to check for NULL before using this clock. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index 075ed42..f009bf4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -191,7 +191,7 @@ static void stih4xx_fix_retime_src(void *priv, u32 spd) } } - if (src == TX_RETIME_SRC_CLKGEN && dwmac->clk && freq) + if (src == TX_RETIME_SRC_CLKGEN && freq) clk_set_rate(dwmac->clk, freq); regmap_update_bits(dwmac->regmap, reg, STIH4XX_RETIME_SRC_MASK, @@ -222,7 +222,7 @@ static void stid127_fix_retime_src(void *priv, u32 spd) freq = DWMAC_2_5MHZ; } - if (dwmac->clk && freq) + if (freq) clk_set_rate(dwmac->clk, freq); regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val); @@ -238,8 +238,7 @@ static int sti_dwmac_init(struct platform_device *pdev, void *priv) u32 reg = dwmac->ctrl_reg; u32 val; - if (dwmac->clk) - clk_prepare_enable(dwmac->clk); + clk_prepare_enable(dwmac->clk); if (of_property_read_bool(np, "st,gmac_en")) regmap_update_bits(regmap, reg, EN_MASK, EN); @@ -258,8 +257,7 @@ static void sti_dwmac_exit(struct platform_device *pdev, void *priv) { struct sti_dwmac *dwmac = priv; - if (dwmac->clk) - clk_disable_unprepare(dwmac->clk); + clk_disable_unprepare(dwmac->clk); } static int sti_dwmac_parse_data(struct sti_dwmac *dwmac, struct platform_device *pdev) -- 2.10.1
[net-next PATCH 1/7] stmmac: dwmac-sti: remove useless of_node check
Since dwmac-sti is a DT only driver checking for OF node is not necessary. Signed-off-by: Joachim Eastwood--- drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c index 58c05ac..075ed42 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c @@ -270,9 +270,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac, struct regmap *regmap; int err; - if (!np) - return -EINVAL; - /* clk selection from extra syscfg register */ dwmac->clk_sel_reg = -ENXIO; res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "sti-clkconf"); -- 2.10.1
[net-next PATCH 0/7] stmmac: dwmac-sti refactor+cleanup
This patch set aims to remove the init/exit callbacks from the dwmac-sti driver and instead use standard PM callbacks. Doing this will also allow us to cleanup the driver. Eventually the init/exit callbacks will be deprecated and removed from all drivers dwmac-* except for dwmac-generic. Drivers will be refactored to use standard PM and remove callbacks. Note that this patch set has only been test compiled and no functional change is intended. Joachim Eastwood (7): stmmac: dwmac-sti: remove useless of_node check stmmac: dwmac-sti: remove clk NULL checks stmmac: dwmac-sti: add PM ops and resume function stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling stmmac: dwmac-sti: clean up and rename sti_dwmac_init stmmac: dwmac-sti: remove unused priv dev member drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 86 - 1 file changed, 57 insertions(+), 29 deletions(-) -- 2.10.1
Re: Let's do P4
Sun, Oct 30, 2016 at 07:44:43PM CET, kubak...@wp.pl wrote: >On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote: >> Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote: >> >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote: >> >> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: >> [...] >> [...] >> >> [...] >> >> [...] >> >> [...] >> >> [...] >> [...] >> >> >> >> Agreed. >> > >> >Just to clarify my intention here was not to suggest the use of eBPF as >> >the IR. I was merely cautioning against bundling the new API with P4, >> >for multiple reasons. As John mentioned P4 spec was evolving in the >> >past. The spec is designed for HW more capable than the switch ASICs we >> >have today. As vendors move to provide more configurability we may need >> >to extend the API beyond P4. We may want to extend this API to for SW >> >hand-offs (as suggested by Thomas) which are not part of P4 spec. Also >> >John showed examples of matchd software which already uses P4 at the >> >frontend today and translates it to different targets (eBPF, u32, HW). >> >It may just be about the naming but I feel like calling the new API >> >more generically, switch AST or some such may help to avoid unnecessary >> >ties and confusion. >> >> Well, that basically means to create "something" that could be be used >> to translate p4 source to. Not sure how exactly this "something" should >> look like and how different would it be from p4. I thought it might >> be good to benefit from the p4 definition and use it directly. Not sure. > >We have to translate the P4 into "something" already, that something >is the AST we will load into the kernel. Or were you planning to use >some official P4 AST? I'm not suggesting we add our own high level I'm not aware of existence of some official P4 AST. We have to figure it out. >language. I agree that P4 is a good starting point, and perhaps a good >high level language. I'm just cautious of creating an equivalency >between high level language (P4) and the kernel ABI. Understood. Definitelly good to be very cautious when defining a kernel API. > >Perhaps I'm just wasting everyone's time with this. > >> >> >> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw: >> >> >> >> | >> >> | +--> ebpf engine >> >> | | >> >> | | >> >> | compilerB >> >> | ^ >> >> | | >> >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC >> >> -> HW >> >> | >> >>userspace | kernel >> >> | >> >> >> >> Now please consider runtime API for rule insertion/removal/stats/etc. >> >> Also, the single API is cls_p4 here: >> >> >> >> | >> >> | >> >> | >> >> | >> >> |ebpf map fillup >> >> | ^ >> >> | | >> >> p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup >> >> | >> >> userspace | kernel >> >> >> > >> >My understanding was that the main purpose of SW eBPF translation would >> >be to piggy back on eBPF userspace map API. This seems not to be the >> >case here? Is "P4 rule" being added via some new API? From performance >> >> cls_p4 TC classifier. > >Oh, so the cls_p4 is just a proxy forwarding the requests to drivers >or eBPF backend. Got it. Sorry for being slow. And the requests >come down via change() op or something new? I wonder how such scheme >compares to eBPF maps performance-wise (updates/sec). I have no numbers at this time. I guess Jamal and Alexei did some measurements in this are in the past. > >> >perspective the SW AST implementation would probably not be any slower >> >than u32, so I don't think we need eBPF for performance. I must be >> >misreading this, if we want eBPF fallback we must extend eBPF with all >> >the map types anyway... so we could just use eBPF map API? I believe >> >John has already done some work in this space (see his GitHub :)) >> >> I don't think you can use existing BPF maps kernel API. You would still >> have to have another API just for the offloaded datapath. And that is >> a bypass. I strongly believe we need a single kernel API for both >> SW and HW datapath setup and runtime configuration. > >Agreed, single API is a must. What is the HW characteristic which >doesn't fit with eBPF map API, though? For eBPF offload I was planning >on adding offload hooks
Re: Let's do P4
On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote: > Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote: > >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote: > >> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: > [...] > [...] > >> [...] > >> [...] > >> [...] > >> [...] > [...] > >> > >> Agreed. > > > >Just to clarify my intention here was not to suggest the use of eBPF as > >the IR. I was merely cautioning against bundling the new API with P4, > >for multiple reasons. As John mentioned P4 spec was evolving in the > >past. The spec is designed for HW more capable than the switch ASICs we > >have today. As vendors move to provide more configurability we may need > >to extend the API beyond P4. We may want to extend this API to for SW > >hand-offs (as suggested by Thomas) which are not part of P4 spec. Also > >John showed examples of matchd software which already uses P4 at the > >frontend today and translates it to different targets (eBPF, u32, HW). > >It may just be about the naming but I feel like calling the new API > >more generically, switch AST or some such may help to avoid unnecessary > >ties and confusion. > > Well, that basically means to create "something" that could be be used > to translate p4 source to. Not sure how exactly this "something" should > look like and how different would it be from p4. I thought it might > be good to benefit from the p4 definition and use it directly. Not sure. We have to translate the P4 into "something" already, that something is the AST we will load into the kernel. Or were you planning to use some official P4 AST? I'm not suggesting we add our own high level language. I agree that P4 is a good starting point, and perhaps a good high level language. I'm just cautious of creating an equivalency between high level language (P4) and the kernel ABI. Perhaps I'm just wasting everyone's time with this. > >> > >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw: > >> > >> | > >> | +--> ebpf engine > >> | | > >> | | > >> | compilerB > >> | ^ > >> | | > >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC > >> -> HW > >> | > >>userspace | kernel > >> | > >> > >> Now please consider runtime API for rule insertion/removal/stats/etc. > >> Also, the single API is cls_p4 here: > >> > >> | > >> | > >> | > >> | > >> |ebpf map fillup > >> | ^ > >> | | > >> p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup > >> | > >> userspace | kernel > >> > > > >My understanding was that the main purpose of SW eBPF translation would > >be to piggy back on eBPF userspace map API. This seems not to be the > >case here? Is "P4 rule" being added via some new API? From performance > > cls_p4 TC classifier. Oh, so the cls_p4 is just a proxy forwarding the requests to drivers or eBPF backend. Got it. Sorry for being slow. And the requests come down via change() op or something new? I wonder how such scheme compares to eBPF maps performance-wise (updates/sec). > >perspective the SW AST implementation would probably not be any slower > >than u32, so I don't think we need eBPF for performance. I must be > >misreading this, if we want eBPF fallback we must extend eBPF with all > >the map types anyway... so we could just use eBPF map API? I believe > >John has already done some work in this space (see his GitHub :)) > > I don't think you can use existing BPF maps kernel API. You would still > have to have another API just for the offloaded datapath. And that is > a bypass. I strongly believe we need a single kernel API for both > SW and HW datapath setup and runtime configuration. Agreed, single API is a must. What is the HW characteristic which doesn't fit with eBPF map API, though? For eBPF offload I was planning on adding offload hooks on eBPF map lookup/update paths and a way of associating the map with a netdev. This should be enough to forward updates to the driver and intercept reads to return the right statistics.
Re: Let's do P4
Sun, Oct 30, 2016 at 06:45:26PM CET, kubak...@wp.pl wrote: >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote: >> Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: >> >On 10/30/16 at 08:44am, Jiri Pirko wrote: >> >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: >> [...] >> [...] >> [...] >> [...] >> > >> >My assumption was that a new IR is defined which is easier to parse than >> >eBPF which is targeted at execution on a CPU and not indented for pattern >> >matching. Just looking at how llvm creates different patterns and reorders >> >instructions, I'm not seeing how eBPF can serve as a general purpose IR >> >if the objective is to allow fairly flexible generation of the bytecode. >> >Hence the alternative IR serving as additional metadata complementing the >> >eBPF program. >> >> Agreed. > >Just to clarify my intention here was not to suggest the use of eBPF as >the IR. I was merely cautioning against bundling the new API with P4, >for multiple reasons. As John mentioned P4 spec was evolving in the >past. The spec is designed for HW more capable than the switch ASICs we >have today. As vendors move to provide more configurability we may need >to extend the API beyond P4. We may want to extend this API to for SW >hand-offs (as suggested by Thomas) which are not part of P4 spec. Also >John showed examples of matchd software which already uses P4 at the >frontend today and translates it to different targets (eBPF, u32, HW). >It may just be about the naming but I feel like calling the new API >more generically, switch AST or some such may help to avoid unnecessary >ties and confusion. Well, that basically means to create "something" that could be be used to translate p4 source to. Not sure how exactly this "something" should look like and how different would it be from p4. I thought it might be good to benefit from the p4 definition and use it directly. Not sure. > >> >I understand what you mean with two APIs now. You want a single IR >> >block and divide the SW/HW part in the kernel rather than let llvm or >> >something else do it. >> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw: >> >> | >> | +--> ebpf engine >> | | >> | | >> | compilerB >> | ^ >> | | >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> >> HW >> | >>userspace | kernel >> | >> >> Now please consider runtime API for rule insertion/removal/stats/etc. >> Also, the single API is cls_p4 here: >> >> | >> | >> | >> | >> |ebpf map fillup >> | ^ >> | | >> p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup >> | >> userspace | kernel >> > >My understanding was that the main purpose of SW eBPF translation would >be to piggy back on eBPF userspace map API. This seems not to be the >case here? Is "P4 rule" being added via some new API? From performance cls_p4 TC classifier. >perspective the SW AST implementation would probably not be any slower >than u32, so I don't think we need eBPF for performance. I must be >misreading this, if we want eBPF fallback we must extend eBPF with all >the map types anyway... so we could just use eBPF map API? I believe >John has already done some work in this space (see his GitHub :)) I don't think you can use existing BPF maps kernel API. You would still have to have another API just for the offloaded datapath. And that is a bypass. I strongly believe we need a single kernel API for both SW and HW datapath setup and runtime configuration. > >As for AST -> eBPF translator in the kernel, IMHO it could be very >useful. Since all the drivers will have to implement translators >anyway, the eBPF translator may help to build a good shared >infrastructure. I mean - it could be a starting place for sharing code >between drivers if done properly. Agreed. > >> >> Well for hw offload, every driver has to parse the IR (whatever will it >> >> be in) and program HW accordingly. Similar parsing and translation would >> >> be needed for SW path, to translate into eBPF. I don't think it would be >> >> more complex than in the drivers. Should be fine. >> > >> >I'm not sure I see why anyone would ever want to use an IR for SW >> >purposes which is restricted to the lowest common denominator of
Re: Let's do P4
On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote: > Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: > >On 10/30/16 at 08:44am, Jiri Pirko wrote: > >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: > [...] > [...] > [...] > [...] > > > >My assumption was that a new IR is defined which is easier to parse than > >eBPF which is targeted at execution on a CPU and not indented for pattern > >matching. Just looking at how llvm creates different patterns and reorders > >instructions, I'm not seeing how eBPF can serve as a general purpose IR > >if the objective is to allow fairly flexible generation of the bytecode. > >Hence the alternative IR serving as additional metadata complementing the > >eBPF program. > > Agreed. Just to clarify my intention here was not to suggest the use of eBPF as the IR. I was merely cautioning against bundling the new API with P4, for multiple reasons. As John mentioned P4 spec was evolving in the past. The spec is designed for HW more capable than the switch ASICs we have today. As vendors move to provide more configurability we may need to extend the API beyond P4. We may want to extend this API to for SW hand-offs (as suggested by Thomas) which are not part of P4 spec. Also John showed examples of matchd software which already uses P4 at the frontend today and translates it to different targets (eBPF, u32, HW). It may just be about the naming but I feel like calling the new API more generically, switch AST or some such may help to avoid unnecessary ties and confusion. > >I understand what you mean with two APIs now. You want a single IR > >block and divide the SW/HW part in the kernel rather than let llvm or > >something else do it. > > Exactly. Following drawing shows p4 pipeline setup for SW and Hw: > > | > | +--> ebpf engine > | | > | | > | compilerB > | ^ > | | > p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW > | >userspace | kernel > | > > Now please consider runtime API for rule insertion/removal/stats/etc. > Also, the single API is cls_p4 here: > > | > | > | > | > |ebpf map fillup > | ^ > | | > p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup > | > userspace | kernel > My understanding was that the main purpose of SW eBPF translation would be to piggy back on eBPF userspace map API. This seems not to be the case here? Is "P4 rule" being added via some new API? From performance perspective the SW AST implementation would probably not be any slower than u32, so I don't think we need eBPF for performance. I must be misreading this, if we want eBPF fallback we must extend eBPF with all the map types anyway... so we could just use eBPF map API? I believe John has already done some work in this space (see his GitHub :)) As for AST -> eBPF translator in the kernel, IMHO it could be very useful. Since all the drivers will have to implement translators anyway, the eBPF translator may help to build a good shared infrastructure. I mean - it could be a starting place for sharing code between drivers if done properly. > >> Well for hw offload, every driver has to parse the IR (whatever will it > >> be in) and program HW accordingly. Similar parsing and translation would > >> be needed for SW path, to translate into eBPF. I don't think it would be > >> more complex than in the drivers. Should be fine. > > > >I'm not sure I see why anyone would ever want to use an IR for SW > >purposes which is restricted to the lowest common denominator of HW. > >A good example here is OpenFlow and how some of its SW consumers > >have evolved with extensions which cannot be mappepd to HW easily. > >The same seems to happen with P4 as it introduces the concept of > >state and other concepts which are hard to map for dumb HW. P4 doesn't > >magically solve this problem, the fundamental difference in > >capabilities between HW and SW remain. > > > [...] > [...] > [...] > >> > >> Yeah, I was also thinking about something similar to your Flow-API, > >> but we need something more generic I believe. > >> > [...] > >> > >> Btw, Flow-API was rejected because it was a clean kernel-bypass. In case > >> of p4, if we do what Thomas is suggesting, having x.bpf for SW and >
Re: [PATCH net-next 0/2] mlx4 XDP TX refactor
From: Tariq ToukanDate: Sun, 30 Oct 2016 18:21:26 +0200 > Hi Dave, > > This series makes Brenden's fix unneeded: > 958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx") > > The fix got into net, but yet to be in net-next. > > Should I wait with this series and send a re-spin, with a revert of > the fix, once it gets into net-next? I'm working on merging net into net-next right now, once that is pushed out you can respin.
[PATCHv2 net] sctp: return back transport in __sctp_rcv_init_lookup
Prior to this patch, it used a local variable to save the transport that is looked up by __sctp_lookup_association(), and didn't return it back. But in sctp_rcv, it is used to initialize chunk->transport. So when hitting this, even if it found the transport, it was still initializing chunk->transport with null instead. This patch is to return the transport back through transport pointer that is from __sctp_rcv_lookup_harder(). Signed-off-by: Xin Long--- net/sctp/input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sctp/input.c b/net/sctp/input.c index a2ea1d1..8e0bc58 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -1021,7 +1021,6 @@ static struct sctp_association *__sctp_rcv_init_lookup(struct net *net, struct sctphdr *sh = sctp_hdr(skb); union sctp_params params; sctp_init_chunk_t *init; - struct sctp_transport *transport; struct sctp_af *af; /* @@ -1052,7 +1051,7 @@ static struct sctp_association *__sctp_rcv_init_lookup(struct net *net, af->from_addr_param(paddr, params.addr, sh->source, 0); - asoc = __sctp_lookup_association(net, laddr, paddr, ); + asoc = __sctp_lookup_association(net, laddr, paddr, transportp); if (asoc) return asoc; } -- 2.1.0
[PATCH net-next 2/7] qed: Add nvram selftest
Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 4 + drivers/net/ethernet/qlogic/qed/qed_main.c | 1 + drivers/net/ethernet/qlogic/qed/qed_mcp.c | 94 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.h | 41 ++ drivers/net/ethernet/qlogic/qed/qed_selftest.c | 101 drivers/net/ethernet/qlogic/qed/qed_selftest.h | 10 +++ drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 7 ++ include/linux/qed/qed_if.h | 9 +++ 8 files changed, 267 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index 36de87a..f7dfa2e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8666,6 +8666,8 @@ struct public_drv_mb { #define DRV_MB_PARAM_BIST_REGISTER_TEST1 #define DRV_MB_PARAM_BIST_CLOCK_TEST 2 +#define DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES 3 +#define DRV_MB_PARAM_BIST_NVM_TEST_IMAGE_BY_INDEX 4 #define DRV_MB_PARAM_BIST_RC_UNKNOWN 0 #define DRV_MB_PARAM_BIST_RC_PASSED1 @@ -8674,6 +8676,8 @@ struct public_drv_mb { #define DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT 0 #define DRV_MB_PARAM_BIST_TEST_INDEX_MASK 0x00FF +#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_SHIFT 8 +#define DRV_MB_PARAM_BIST_TEST_IMAGE_INDEX_MASK0xFF00 u32 fw_mb_header; #define FW_MSG_CODE_MASK 0x diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 937968b..612c094 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -1509,6 +1509,7 @@ struct qed_selftest_ops qed_selftest_ops_pass = { .selftest_interrupt = _selftest_interrupt, .selftest_register = _selftest_register, .selftest_clock = _selftest_clock, + .selftest_nvram = _selftest_nvram, }; const struct qed_common_ops qed_common_ops_pass = { diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 98dc913..8be6157 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1434,6 +1434,52 @@ int qed_mcp_mask_parities(struct qed_hwfn *p_hwfn, return rc; } +int qed_mcp_nvm_read(struct qed_dev *cdev, u32 addr, u8 *p_buf, u32 len) +{ + u32 bytes_left = len, offset = 0, bytes_to_copy, read_len = 0; + struct qed_hwfn *p_hwfn = QED_LEADING_HWFN(cdev); + u32 resp = 0, resp_param = 0; + struct qed_ptt *p_ptt; + int rc = 0; + + p_ptt = qed_ptt_acquire(p_hwfn); + if (!p_ptt) + return -EBUSY; + + while (bytes_left > 0) { + bytes_to_copy = min_t(u32, bytes_left, MCP_DRV_NVM_BUF_LEN); + + rc = qed_mcp_nvm_rd_cmd(p_hwfn, p_ptt, + DRV_MSG_CODE_NVM_READ_NVRAM, + addr + offset + + (bytes_to_copy << +DRV_MB_PARAM_NVM_LEN_SHIFT), + , _param, + _len, + (u32 *)(p_buf + offset)); + + if (rc || (resp != FW_MSG_CODE_NVM_OK)) { + DP_NOTICE(cdev, "MCP command rc = %d\n", rc); + break; + } + + /* This can be a lengthy process, and it's possible scheduler +* isn't preemptable. Sleep a bit to prevent CPU hogging. +*/ + if (bytes_left % 0x1000 < + (bytes_left - read_len) % 0x1000) + usleep_range(1000, 2000); + + offset += read_len; + bytes_left -= read_len; + } + + cdev->mcp_nvm_resp = resp; + qed_ptt_release(p_hwfn, p_ptt); + + return rc; +} + int qed_mcp_bist_register_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) { u32 drv_mb_param = 0, rsp, param; @@ -1475,3 +1521,51 @@ int qed_mcp_bist_clock_test(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) return rc; } + +int qed_mcp_bist_nvm_test_get_num_images(struct qed_hwfn *p_hwfn, +struct qed_ptt *p_ptt, +u32 *num_images) +{ + u32 drv_mb_param = 0, rsp; + int rc = 0; + + drv_mb_param = (DRV_MB_PARAM_BIST_NVM_TEST_NUM_IMAGES << + DRV_MB_PARAM_BIST_TEST_INDEX_SHIFT); + + rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_BIST_TEST, +drv_mb_param, , num_images); + if (rc) + return rc; + + if (((rsp & FW_MSG_CODE_MASK) != FW_MSG_CODE_OK)) + rc = -EINVAL; + +
[PATCH net-next 4/7] qede: Decouple ethtool caps from qed
While the qed_lm_maps is closely tied with the QED_LM_* defines, when iterating over the array use actual size instead of the qed define to prevent future possible issues. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index 42d9739..d230742 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -320,7 +320,7 @@ struct qede_link_mode_mapping { { \ int i; \ \ - for (i = 0; i < QED_LM_COUNT; i++) {\ + for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) { \ if ((caps) & (qed_lm_map[i].qed_link_mode)) \ __set_bit(qed_lm_map[i].ethtool_link_mode,\ lk_ksettings->link_modes.name); \ @@ -331,7 +331,7 @@ struct qede_link_mode_mapping { { \ int i; \ \ - for (i = 0; i < QED_LM_COUNT; i++) {\ + for (i = 0; i < ARRAY_SIZE(qed_lm_map); i++) { \ if (test_bit(qed_lm_map[i].ethtool_link_mode, \ lk_ksettings->link_modes.name))\ caps |= qed_lm_map[i].qed_link_mode;\ -- 1.9.3
[PATCH net-next 6/7] qed: Use VF-queue feature
Driver sets several restrictions about the number of supported VFs according to available HW/FW resources. This creates a problem as there are constellations which can't be supported [as limitation don't accurately describe the resources], as well as holes where enabling IOV would fail due to supposed lack of resources. This introduces a new interal feature - vf-queues, which would be used to lift some of the restriction and accurately enumerate the queues that can be used by a given PF's VFs. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_dev.c | 20 ++ drivers/net/ethernet/qlogic/qed/qed_int.c | 32 - drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 ++- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 8828ffa..6d3013f 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -174,6 +174,7 @@ enum QED_FEATURE { QED_PF_L2_QUE, QED_VF, QED_RDMA_CNQ, + QED_VF_L2_QUE, QED_MAX_FEATURES, }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 13833a5..b59da1a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1475,6 +1475,7 @@ static void get_function_id(struct qed_hwfn *p_hwfn) static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) { u32 *feat_num = p_hwfn->hw_info.feat_num; + struct qed_sb_cnt_info sb_cnt_info; int num_features = 1; #if IS_ENABLED(CONFIG_INFINIBAND_QEDR) @@ -1493,10 +1494,21 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) feat_num[QED_PF_L2_QUE] = min_t(u32, RESC_NUM(p_hwfn, QED_SB) / num_features, RESC_NUM(p_hwfn, QED_L2_QUEUE)); - DP_VERBOSE(p_hwfn, NETIF_MSG_PROBE, - "#PF_L2_QUEUES=%d #SBS=%d num_features=%d\n", - feat_num[QED_PF_L2_QUE], RESC_NUM(p_hwfn, QED_SB), - num_features); + + memset(_cnt_info, 0, sizeof(sb_cnt_info)); + qed_int_get_num_sbs(p_hwfn, _cnt_info); + feat_num[QED_VF_L2_QUE] = + min_t(u32, + RESC_NUM(p_hwfn, QED_L2_QUEUE) - + FEAT_NUM(p_hwfn, QED_PF_L2_QUE), sb_cnt_info.sb_iov_cnt); + + DP_VERBOSE(p_hwfn, + NETIF_MSG_PROBE, + "#PF_L2_QUEUES=%d VF_L2_QUEUES=%d #ROCE_CNQ=%d #SBS=%d num_features=%d\n", + (int)FEAT_NUM(p_hwfn, QED_PF_L2_QUE), + (int)FEAT_NUM(p_hwfn, QED_VF_L2_QUE), + (int)FEAT_NUM(p_hwfn, QED_RDMA_CNQ), + RESC_NUM(p_hwfn, QED_SB), num_features); } static int qed_hw_get_resc(struct qed_hwfn *p_hwfn) diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 2adedc6..bb74e1c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -3030,6 +3030,31 @@ int qed_int_igu_read_cam(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) } } } + + /* There's a possibility the igu_sb_cnt_iov doesn't properly reflect +* the number of VF SBs [especially for first VF on engine, as we can't +* diffrentiate between empty entries and its entries]. +* Since we don't really support more SBs than VFs today, prevent any +* such configuration by sanitizing the number of SBs to equal the +* number of VFs. +*/ + if (IS_PF_SRIOV(p_hwfn)) { + u16 total_vfs = p_hwfn->cdev->p_iov_info->total_vfs; + + if (total_vfs < p_igu_info->free_blks) { + DP_VERBOSE(p_hwfn, + (NETIF_MSG_INTR | QED_MSG_IOV), + "Limiting number of SBs for IOV - %04x --> %04x\n", + p_igu_info->free_blks, + p_hwfn->cdev->p_iov_info->total_vfs); + p_igu_info->free_blks = total_vfs; + } else if (total_vfs > p_igu_info->free_blks) { + DP_NOTICE(p_hwfn, + "IGU has only %04x SBs for VFs while the device has %04x VFs\n", + p_igu_info->free_blks, total_vfs); + return -EINVAL; + } + } p_igu_info->igu_sb_cnt_iov = p_igu_info->free_blks; DP_VERBOSE( @@ -3163,7 +3188,12 @@ u16 qed_int_queue_id_from_sb_id(struct qed_hwfn *p_hwfn, u16 sb_id) return sb_id - p_info->igu_base_sb; } else if ((sb_id >= p_info->igu_base_sb_iov) && (sb_id <
[PATCH net-next 3/7] qed*: Add support for WoL
Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed.h | 11 - drivers/net/ethernet/qlogic/qed/qed_dev.c | 19 - drivers/net/ethernet/qlogic/qed/qed_hsi.h | 4 ++ drivers/net/ethernet/qlogic/qed/qed_main.c | 29 + drivers/net/ethernet/qlogic/qed/qed_mcp.c | 56 - drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 41 ++ drivers/net/ethernet/qlogic/qede/qede_main.c| 9 include/linux/qed/qed_if.h | 10 + 9 files changed, 176 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index f20243c..8828ffa 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -195,6 +195,11 @@ enum qed_dev_cap { QED_DEV_CAP_ROCE, }; +enum qed_wol_support { + QED_WOL_SUPPORT_NONE, + QED_WOL_SUPPORT_PME, +}; + struct qed_hw_info { /* PCI personality */ enum qed_pci_personalitypersonality; @@ -227,6 +232,8 @@ struct qed_hw_info { u32 hw_mode; unsigned long device_capabilities; u16 mtu; + + enum qed_wol_support b_wol_support; }; struct qed_hw_cid_data { @@ -539,7 +546,9 @@ struct qed_dev { u8 mcp_rev; u8 boot_mode; - u8 wol; + /* WoL related configurations */ + u8 wol_config; + u8 wol_mac[ETH_ALEN]; u32 int_mode; enum qed_coalescing_modeint_coalescing_mode; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 9ef6dfd..13833a5 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1363,8 +1363,24 @@ int qed_hw_reset(struct qed_dev *cdev) { int rc = 0; u32 unload_resp, unload_param; + u32 wol_param; int i; + switch (cdev->wol_config) { + case QED_OV_WOL_DISABLED: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_DISABLED; + break; + case QED_OV_WOL_ENABLED: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_ENABLED; + break; + default: + DP_NOTICE(cdev, + "Unknown WoL configuration %02x\n", cdev->wol_config); + /* Fallthrough */ + case QED_OV_WOL_DEFAULT: + wol_param = DRV_MB_PARAM_UNLOAD_WOL_MCP; + } + for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; @@ -1393,8 +1409,7 @@ int qed_hw_reset(struct qed_dev *cdev) /* Send unload command to MCP */ rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt, -DRV_MSG_CODE_UNLOAD_REQ, -DRV_MB_PARAM_UNLOAD_WOL_MCP, +DRV_MSG_CODE_UNLOAD_REQ, wol_param, _resp, _param); if (rc) { DP_NOTICE(p_hwfn, "qed_hw_reset: UNLOAD_REQ failed\n"); diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index f7dfa2e..fdb7a09 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8601,6 +8601,7 @@ struct public_drv_mb { #define DRV_MSG_CODE_BIST_TEST 0x001e #define DRV_MSG_CODE_SET_LED_MODE 0x0020 +#define DRV_MSG_CODE_OS_WOL0x002e #define DRV_MSG_SEQ_NUMBER_MASK0x @@ -8697,6 +8698,9 @@ struct public_drv_mb { #define FW_MSG_CODE_NVM_OK 0x0001 #define FW_MSG_CODE_OK 0x0016 +#define FW_MSG_CODE_OS_WOL_SUPPORTED0x0080 +#define FW_MSG_CODE_OS_WOL_NOT_SUPPORTED0x0081 + #define FW_MSG_SEQ_NUMBER_MASK 0x u32 fw_mb_param; diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 612c094..a95a1af 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -223,6 +223,10 @@ int qed_fill_dev_info(struct qed_dev *cdev, dev_info->fw_eng = FW_ENGINEERING_VERSION; dev_info->mf_mode = cdev->mf_mode; dev_info->tx_switching = true; + + if (QED_LEADING_HWFN(cdev)->hw_info.b_wol_support == + QED_WOL_SUPPORT_PME) + dev_info->wol_support = true; } else { qed_vf_get_fw_version(>hwfns[0], _info->fw_major,
[PATCH net-next 5/7] qed: Learn of RDMA capabilities per-device
Today, RDMA capabilities are learned from management firmware which provides a per-device indication for all interfaces. Newer management firmware is capable of providing a per-device indication [would later be extended to either RoCE/iWARP]. Try using this newer learning mechanism, but fallback in case management firmware is too old to retain current functionality. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 7 +++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 78 +++ 2 files changed, 77 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h index fdb7a09..1d113ce 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h @@ -8601,6 +8601,7 @@ struct public_drv_mb { #define DRV_MSG_CODE_BIST_TEST 0x001e #define DRV_MSG_CODE_SET_LED_MODE 0x0020 +#define DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL 0x002b #define DRV_MSG_CODE_OS_WOL0x002e #define DRV_MSG_SEQ_NUMBER_MASK0x @@ -8705,6 +8706,12 @@ struct public_drv_mb { u32 fw_mb_param; + /* get pf rdma protocol command responce */ +#define FW_MB_PARAM_GET_PF_RDMA_NONE 0x0 +#define FW_MB_PARAM_GET_PF_RDMA_ROCE 0x1 +#define FW_MB_PARAM_GET_PF_RDMA_IWARP 0x2 +#define FW_MB_PARAM_GET_PF_RDMA_BOTH 0x3 + u32 drv_pulse_mb; #define DRV_PULSE_SEQ_MASK 0x7fff #define DRV_PULSE_SYSTEM_TIME_MASK 0x diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 768b35b..0927488 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -1024,28 +1024,89 @@ int qed_mcp_get_media_type(struct qed_dev *cdev, u32 *p_media_type) return 0; } +/* Old MFW has a global configuration for all PFs regarding RDMA support */ +static void +qed_mcp_get_shmem_proto_legacy(struct qed_hwfn *p_hwfn, + enum qed_pci_personality *p_proto) +{ + /* There wasn't ever a legacy MFW that published iwarp. +* So at this point, this is either plain l2 or RoCE. +*/ + if (test_bit(QED_DEV_CAP_ROCE, _hwfn->hw_info.device_capabilities)) + *p_proto = QED_PCI_ETH_ROCE; + else + *p_proto = QED_PCI_ETH; + + DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP, + "According to Legacy capabilities, L2 personality is %08x\n", + (u32) *p_proto); +} + +static int +qed_mcp_get_shmem_proto_mfw(struct qed_hwfn *p_hwfn, + struct qed_ptt *p_ptt, + enum qed_pci_personality *p_proto) +{ + u32 resp = 0, param = 0; + int rc; + + rc = qed_mcp_cmd(p_hwfn, p_ptt, +DRV_MSG_CODE_GET_PF_RDMA_PROTOCOL, 0, , ); + if (rc) + return rc; + if (resp != FW_MSG_CODE_OK) { + DP_VERBOSE(p_hwfn, NETIF_MSG_IFUP, + "MFW lacks support for command; Returns %08x\n", + resp); + return -EINVAL; + } + + switch (param) { + case FW_MB_PARAM_GET_PF_RDMA_NONE: + *p_proto = QED_PCI_ETH; + break; + case FW_MB_PARAM_GET_PF_RDMA_ROCE: + *p_proto = QED_PCI_ETH_ROCE; + break; + case FW_MB_PARAM_GET_PF_RDMA_BOTH: + DP_NOTICE(p_hwfn, + "Current day drivers don't support RoCE & iWARP. Default to RoCE-only\n"); + *p_proto = QED_PCI_ETH_ROCE; + break; + case FW_MB_PARAM_GET_PF_RDMA_IWARP: + default: + DP_NOTICE(p_hwfn, + "MFW answers GET_PF_RDMA_PROTOCOL but param is %08x\n", + param); + return -EINVAL; + } + + DP_VERBOSE(p_hwfn, + NETIF_MSG_IFUP, + "According to capabilities, L2 personality is %08x [resp %08x param %08x]\n", + (u32) *p_proto, resp, param); + return 0; +} + static int qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn, struct public_func *p_info, + struct qed_ptt *p_ptt, enum qed_pci_personality *p_proto) { int rc = 0; switch (p_info->config & FUNC_MF_CFG_PROTOCOL_MASK) { case FUNC_MF_CFG_PROTOCOL_ETHERNET: - if (test_bit(QED_DEV_CAP_ROCE, -_hwfn->hw_info.device_capabilities)) - *p_proto = QED_PCI_ETH_ROCE; - else - *p_proto = QED_PCI_ETH; + if (qed_mcp_get_shmem_proto_mfw(p_hwfn, p_ptt, p_proto)) +
[PATCH net-next 7/7] qed: Learn resources from management firmware
From: Tomer TayarCurrently, each interfaces assumes it receives an equal portion of HW/FW resources, but this is wasteful - different partitions [and specifically, parititions exposing different protocol support] might require different resources. Implement a new resource learning scheme where the information is received directly from the management firmware [which has knowledge of all of the functions and can serve as arbiter]. Signed-off-by: Tomer Tayar Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed.h | 6 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 291 -- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 46 + drivers/net/ethernet/qlogic/qed/qed_l2.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_mcp.c | 42 + drivers/net/ethernet/qlogic/qed/qed_mcp.h | 15 ++ include/linux/qed/qed_eth_if.h| 2 +- 7 files changed, 341 insertions(+), 63 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 6d3013f..50b8a01 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -154,7 +154,10 @@ struct qed_qm_iids { u32 tids; }; -enum QED_RESOURCES { +/* HW / FW resources, output of features supported below, most information + * is received from MFW. + */ +enum qed_resources { QED_SB, QED_L2_QUEUE, QED_VPORT, @@ -166,6 +169,7 @@ enum QED_RESOURCES { QED_RDMA_CNQ_RAM, QED_ILT, QED_LL2_QUEUE, + QED_CMDQS_CQS, QED_RDMA_STATS_QUEUE, QED_MAX_RESC, }; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index b59da1a..edd9ad0 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1511,47 +1511,240 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn) RESC_NUM(p_hwfn, QED_SB), num_features); } -static int qed_hw_get_resc(struct qed_hwfn *p_hwfn) +static enum resource_id_enum qed_hw_get_mfw_res_id(enum qed_resources res_id) +{ + enum resource_id_enum mfw_res_id = RESOURCE_NUM_INVALID; + + switch (res_id) { + case QED_SB: + mfw_res_id = RESOURCE_NUM_SB_E; + break; + case QED_L2_QUEUE: + mfw_res_id = RESOURCE_NUM_L2_QUEUE_E; + break; + case QED_VPORT: + mfw_res_id = RESOURCE_NUM_VPORT_E; + break; + case QED_RSS_ENG: + mfw_res_id = RESOURCE_NUM_RSS_ENGINES_E; + break; + case QED_PQ: + mfw_res_id = RESOURCE_NUM_PQ_E; + break; + case QED_RL: + mfw_res_id = RESOURCE_NUM_RL_E; + break; + case QED_MAC: + case QED_VLAN: + /* Each VFC resource can accommodate both a MAC and a VLAN */ + mfw_res_id = RESOURCE_VFC_FILTER_E; + break; + case QED_ILT: + mfw_res_id = RESOURCE_ILT_E; + break; + case QED_LL2_QUEUE: + mfw_res_id = RESOURCE_LL2_QUEUE_E; + break; + case QED_RDMA_CNQ_RAM: + case QED_CMDQS_CQS: + /* CNQ/CMDQS are the same resource */ + mfw_res_id = RESOURCE_CQS_E; + break; + case QED_RDMA_STATS_QUEUE: + mfw_res_id = RESOURCE_RDMA_STATS_QUEUE_E; + break; + default: + break; + } + + return mfw_res_id; +} + +static u32 qed_hw_get_dflt_resc_num(struct qed_hwfn *p_hwfn, + enum qed_resources res_id) { - u8 enabled_func_idx = p_hwfn->enabled_func_idx; - u32 *resc_start = p_hwfn->hw_info.resc_start; u8 num_funcs = p_hwfn->num_funcs_on_engine; - u32 *resc_num = p_hwfn->hw_info.resc_num; struct qed_sb_cnt_info sb_cnt_info; - int i, max_vf_vlan_filters; + u32 dflt_resc_num = 0; - memset(_cnt_info, 0, sizeof(sb_cnt_info)); + switch (res_id) { + case QED_SB: + memset(_cnt_info, 0, sizeof(sb_cnt_info)); + qed_int_get_num_sbs(p_hwfn, _cnt_info); + dflt_resc_num = sb_cnt_info.sb_cnt; + break; + case QED_L2_QUEUE: + dflt_resc_num = MAX_NUM_L2_QUEUES_BB / num_funcs; + break; + case QED_VPORT: + dflt_resc_num = MAX_NUM_VPORTS_BB / num_funcs; + break; + case QED_RSS_ENG: + dflt_resc_num = ETH_RSS_ENGINE_NUM_BB / num_funcs; + break; + case QED_PQ: + /* The granularity of the PQs is 8 */ + dflt_resc_num = MAX_QM_TX_QUEUES_BB / num_funcs; + dflt_resc_num &= ~0x7; + break; + case QED_RL: + dflt_resc_num =
[PATCH net-next 0/7] qed*: Patch series
This series does several things. The bigger changes: - Add new notification APIs [& Defaults] for various fields. The series then utilizes some of those qed <-> qede APIs to bass WoL support upon. - Change the resource allocation scheme to receive the values from management firmware, instead of equally sharing resources between functions [that might not need those]. That would, e.g., allow us to configure additional filters to network interfaces in presence of storage [PCI] functions from same adapter. Dave, Please consider applying this series to `net-next'. Thanks, Yuval Sudarsana Kalluru (1): qed*: Management firmware - notifications and defaults Tomer Tayar (1): qed: Learn resources from management firmware Yuval Mintz (5): qed: Add nvram selftest qed*: Add support for WoL qede: Decouple ethtool caps from qed qed: Learn of RDMA capabilities per-device qed: Use VF-queue feature drivers/net/ethernet/qlogic/qed/qed.h | 19 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 382 + drivers/net/ethernet/qlogic/qed/qed_hsi.h | 120 ++- drivers/net/ethernet/qlogic/qed/qed_int.c | 32 +- drivers/net/ethernet/qlogic/qed/qed_l2.c| 2 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 105 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 433 +++- drivers/net/ethernet/qlogic/qed/qed_mcp.h | 158 + drivers/net/ethernet/qlogic/qed/qed_selftest.c | 101 ++ drivers/net/ethernet/qlogic/qed/qed_selftest.h | 10 + drivers/net/ethernet/qlogic/qed/qed_sriov.c | 17 +- drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 54 ++- drivers/net/ethernet/qlogic/qede/qede_main.c| 17 + include/linux/qed/qed_eth_if.h | 2 +- include/linux/qed/qed_if.h | 47 +++ 16 files changed, 1404 insertions(+), 97 deletions(-) -- 1.9.3
[PATCH net-next 1/7] qed*: Management firmware - notifications and defaults
From: Sudarsana KalluruManagement firmware is interested in various tidbits about the driver - including the driver state & several configuration related fields [MTU, primtary MAC, etc.]. This adds the necessray logic to update MFW with such configurations, some of which are passed directly via qed while for others APIs are provide so that qede would be able to later configure if needed. This also introduces a new default configuration for MTU which would replace the default inherited by being an ethernet device. Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed.h | 1 + drivers/net/ethernet/qlogic/qed/qed_dev.c | 52 +++- drivers/net/ethernet/qlogic/qed/qed_hsi.h | 59 - drivers/net/ethernet/qlogic/qed/qed_main.c | 75 +++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 163 drivers/net/ethernet/qlogic/qed/qed_mcp.h | 102 +++ drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 2 + drivers/net/ethernet/qlogic/qede/qede_main.c| 8 ++ include/linux/qed/qed_if.h | 28 9 files changed, 487 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 653bb57..f20243c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -226,6 +226,7 @@ struct qed_hw_info { u32 port_mode; u32 hw_mode; unsigned long device_capabilities; + u16 mtu; }; struct qed_hw_cid_data { diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 754f6a9..9ef6dfd 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1056,8 +1056,10 @@ int qed_hw_init(struct qed_dev *cdev, bool allow_npar_tx_switch, const u8 *bin_fw_data) { - u32 load_code, param; - int rc, mfw_rc, i; + u32 load_code, param, drv_mb_param; + bool b_default_mtu = true; + struct qed_hwfn *p_hwfn; + int rc = 0, mfw_rc, i; if ((int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) { DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n"); @@ -1073,6 +1075,12 @@ int qed_hw_init(struct qed_dev *cdev, for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; + /* If management didn't provide a default, set one of our own */ + if (!p_hwfn->hw_info.mtu) { + p_hwfn->hw_info.mtu = 1500; + b_default_mtu = false; + } + if (IS_VF(cdev)) { p_hwfn->b_int_enabled = 1; continue; @@ -1156,6 +1164,38 @@ int qed_hw_init(struct qed_dev *cdev, p_hwfn->hw_init_done = true; } + if (IS_PF(cdev)) { + p_hwfn = QED_LEADING_HWFN(cdev); + drv_mb_param = (FW_MAJOR_VERSION << 24) | + (FW_MINOR_VERSION << 16) | + (FW_REVISION_VERSION << 8) | + (FW_ENGINEERING_VERSION); + rc = qed_mcp_cmd(p_hwfn, p_hwfn->p_main_ptt, +DRV_MSG_CODE_OV_UPDATE_STORM_FW_VER, +drv_mb_param, _code, ); + if (rc) + DP_INFO(p_hwfn, "Failed to update firmware version\n"); + + if (!b_default_mtu) { + rc = qed_mcp_ov_update_mtu(p_hwfn, p_hwfn->p_main_ptt, + p_hwfn->hw_info.mtu); + if (rc) + DP_INFO(p_hwfn, + "Failed to update default mtu\n"); + } + + rc = qed_mcp_ov_update_driver_state(p_hwfn, + p_hwfn->p_main_ptt, + QED_OV_DRIVER_STATE_DISABLED); + if (rc) + DP_INFO(p_hwfn, "Failed to update driver state\n"); + + rc = qed_mcp_ov_update_eswitch(p_hwfn, p_hwfn->p_main_ptt, + QED_OV_ESWITCH_VEB); + if (rc) + DP_INFO(p_hwfn, "Failed to update eswitch mode\n"); + } + return 0; } @@ -1800,6 +1840,9 @@ static void qed_get_num_funcs(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) qed_get_num_funcs(p_hwfn, p_ptt); + if (qed_mcp_is_init(p_hwfn)) + p_hwfn->hw_info.mtu = p_hwfn->mcp_info->func_info.mtu; + return
Re: Let's do P4
Sun, Oct 30, 2016 at 11:26:49AM CET, tg...@suug.ch wrote: >On 10/30/16 at 08:44am, Jiri Pirko wrote: >> Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: >> >On 16-10-29 07:49 AM, Jakub Kicinski wrote: >> >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote: >> >>> Hi all. >> >>> >> >>> The network world is divided into 2 general types of hw: >> >>> 1) network ASICs - network specific silicon, containing things like TCAM >> >>>These ASICs are suitable to be programmed by P4. >> >>> 2) network processors - basically a general purpose CPUs >> >>>These processors are suitable to be programmed by eBPF. >> >>> >> >>> I believe that by now, the most people came to a conclusion that it is >> >>> very difficult to handle both types by either P4 or eBPF. And since >> >>> eBPF is part of the kernel, I would like to introduce P4 into kernel >> >>> as well. Here's a plan: >> >>> >> >>> 1) Define P4 intermediate representation >> >>>I cannot imagine loading P4 program (c-like syntax text file) into >> >>>kernel as is. That means that as the first step, we need find some >> >>>intermediate representation. I can imagine someting in a form of AST, >> >>>call it "p4ast". I don't really know how to do this exactly though, >> >>>it's just an idea. >> >>> >> >>>In the end there would be a userspace precompiler for this: >> >>>$ makep4ast example.p4 example.ast >> >> >> >> Maybe stating the obvious, but IMHO defining the IR is the hardest part. >> >> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF. The >> >> AST/IR for switch pipelines should allow for similar flexibility. >> >> Looser coupling would also protect us from changes in spec of the high >> >> level language. > >My assumption was that a new IR is defined which is easier to parse than >eBPF which is targeted at execution on a CPU and not indented for pattern >matching. Just looking at how llvm creates different patterns and reorders >instructions, I'm not seeing how eBPF can serve as a general purpose IR >if the objective is to allow fairly flexible generation of the bytecode. >Hence the alternative IR serving as additional metadata complementing the >eBPF program. Agreed. [...] >> >... And merging threads here with Jiri's email ... >> > >> >> If you do p4>ebpf in userspace, you have 2 apis: >> >> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel >> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel >> >> >> >> Those are 2 apis. Both wrapped up by TC, but still 2 apis. >> >> >> >> What I believe is correct is to have one api: >> >> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel >> >> 2) to setup hw p4 datapath, you push program.p4ast to kernel > >I understand what you mean with two APIs now. You want a single IR >block and divide the SW/HW part in the kernel rather than let llvm or >something else do it. Exactly. Following drawing shows p4 pipeline setup for SW and Hw: | | +--> ebpf engine | | | | | compilerB | ^ | | p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW | userspace | kernel | Now please consider runtime API for rule insertion/removal/stats/etc. Also, the single API is cls_p4 here: | | | | |ebpf map fillup | ^ | | p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup | userspace | kernel > >> >Couple comments around this, first adding yet another IR in the kernel >> >and another JIT engine to map that IR on to eBPF or hardware vendor X >> >doesn't get me excited. Its really much easier to write these as backend >> >objects in LLVM. Not saying it can't be done just saying it is easier >> >in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF. >> >In the end this would be a reasonably complex bit of code in >> >the kernel only for hardware offload. I have doubts that folks would >> >ever use it for software only cases. I'm happy to admit I'm wrong here >> >though. >> >> Well for hw offload, every driver has to parse the IR (whatever will it >> be in) and program HW accordingly. Similar parsing and translation would >> be needed for SW path, to translate into eBPF. I don't think it would be >> more complex than in the drivers. Should be fine. > >I'm not sure
Re: [PATCH net-next 0/2] mlx4 XDP TX refactor
Hi Dave, This series makes Brenden's fix unneeded: 958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx") The fix got into net, but yet to be in net-next. Should I wait with this series and send a re-spin, with a revert of the fix, once it gets into net-next? Regards, Tariq On 27/10/2016 5:52 PM, Tariq Toukan wrote: Hi Dave, This patchset refactors the XDP forwarding case, so that its dedicated transmit queues are managed in a complete separation from the other regular ones. Series generated against net-next commit: 6edf10173a1f "devlink: Prevent port_type_set() callback when it's not needed" Thanks, Tariq. Tariq Toukan (2): net/mlx4_en: Add TX_XDP for CQ types net/mlx4_en: Refactor the XDP forwarding rings scheme drivers/net/ethernet/mellanox/mlx4/en_cq.c | 18 +- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 76 +++-- drivers/net/ethernet/mellanox/mlx4/en_main.c| 2 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 378 ++-- drivers/net/ethernet/mellanox/mlx4/en_port.c| 4 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 +- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 9 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h| 18 +- 8 files changed, 305 insertions(+), 208 deletions(-)
Re: [PATCH net-next 2/2] net/mlx4_en: Refactor the XDP forwarding rings scheme
On 28/10/2016 4:07 AM, Alexei Starovoitov wrote: On Thu, Oct 27, 2016 at 05:52:04PM +0300, Tariq Toukan wrote: Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. The commit log is too scarce for details... So questions: - Did you test with changing the number of channels after xdp prog is loaded? That was the recent bug that Brenden fixed. Bug no longer exists, as the indices of the XDP TX rings now start from 0, each is identical to its respective RX ring. Brenden's fix didn't get to net-next yet, and it shouldn't once the series is applied. I need to take this w Dave. - does it still have 256 tx queue limit or xdp tx rings can go over? It still has the limit of 256 TX queues. - Any performance implications ? I didn't see any performance implications. Note that the XDP TX rings are no longer shown in ethtool -S. Brenden, could you please review this patch? Regards, Tariq
Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: Saeed MahameedDate: Sun, 30 Oct 2016 11:59:57 +0200 > On Fri, Oct 28, 2016 at 7:53 PM, David Miller wrote: >> >> I really disalike pull requests of this form. >> >> You add lots of datastructures and helper functions but no actual >> users of these facilities to the driver. >> >> Do this instead: >> >> 1) Add TSAR infrastructure >> 2) Add use of TSAR facilities to the driver >> >> That's one pull request. >> >> I don't care if this is hard, or if there are entanglements with >> Infiniband or whatever, you must submit changes in this manner. >> > > It is not hard, it is just not right, we have lots of IB and ETH > features that we would like to submit in the same kernel cycle, > with your suggestion I will have to almost submit every feature (core > infrastructure and netdev/RDMA usage) > to you and Doug. Nobody can properly review an API addition without seeing how that API is _USED_. This is a simple fundamental fact. And I'm not pulling in code that can't be reviewed properly. Also, so many times people have added new junk to drivers and months later never added the users of that new code and interfaces. Forcing you to provide the use with the API addition makes sure that it is absolutely impossible for that to happen. Whatever issues you think prevent this are your issues, not mine. I want high quality submissions that can be properly reviewed, and you have to find a way to satisfy that requirement.
Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
Thanks for your reply. The code was tested on a Cube i9 which has an internal rtl8723bu. No other devices were tested. I am happy to accept in an ideal context hard coding macpower is undesirable, the comment is undesirable and it is wrong to assume the issue is not unique to the rtl8723bu. Your reply is idealistic. What can I do now? I should of course have factored out other untested devices in my patches. The apparent concern you have with process over outcome is a useful lesson. We are not in an ideal situation. The comment is of course relevant and useful to starting a process to fixing a real bug I do not have sufficient information to refine any further for and others do. In the circumstances nothing really more can be expected. My patch cover letter, [PATCH 0/2] provides evidence of a mess with regard to determining macpower for the rtl8723bu and what is subsequently required. This is important. The kernel driver code is very poorly documented and there is not a single source reference to device documentation. For example macpower is noting more than a setting that is true or false according to whether a read of a particular register return 0xef or not. Such value was never obtained so a full init sequence was never performed. It would be helpful if you could provide a link to device references. As it is, how am I supposed to revise the patch without relevant information? My patch code works with the Cube i9, as is, despite a lack of adequate information. Before it did not. That is a powerful statement Have a nice day. John Heenan On 30 October 2016 at 22:00, Jes Sorensenwrote: > John Heenan writes: >> Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set >> macpower, is never 0xea. It is only ever 0x01 (first time after modprobe) >> using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results >> occurs with 'Fix for authentication failure' [PATCH 1/2] in place. >> >> Whatever was returned, code tests always showed that at least >> rtl8xxxu_init_queue_reserved_page(priv); >> is always required. Not called if macpower set to true. >> >> Please see cover letter, [PATCH 0/2], for more information from tests. >> > > Sorry but this patch is neither serious nor acceptable. First of all, > hardcoding macpower like this right after an if statement is plain > wrong, second your comments violate all kernel rules. > > Second, you argue this was tested using code test - on which device? Did > you test it on all rtl8xxxu based devices or just rtl8723bu? > > NACK > > Jes
Re: net/dccp: warning in dccp_feat_clone_sp_val/__might_sleep
On Sun, 2016-10-30 at 05:41 +0100, Andrey Konovalov wrote: > Sorry, the warning is still there. > > I'm not sure adding sched_annotate_sleep() does anything, since it's > defined as (in case CONFIG_DEBUG_ATOMIC_SLEEP is not set): > # define sched_annotate_sleep() do { } while (0) Thanks again for testing. But you do have CONFIG_DEBUG_ATOMIC_SLEEP set, which triggers a check in __might_sleep() : WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change, Relevant commit is 00845eb968ead28007338b2bb852b8beef816583 ("sched: don't cause task state changes in nested sleep debugging") Another relevant commit was 26cabd31259ba43f68026ce3f62b78094124333f ("sched, net: Clean up sk_wait_event() vs. might_sleep()") Before release_sock() could process the backlog in process context, only lock_sock() could trigger the issue, so my fix at that time was commit cb7cf8a33ff73cf638481d1edf883d8968f934f8 ("inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()") I guess we need something else now, because the following : static int dccp_wait_for_ccid(struct sock *sk, unsigned long delay) { DEFINE_WAIT(wait); long remaining; prepare_to_wait(sk_sleep(sk), , TASK_INTERRUPTIBLE); sk->sk_write_pending++; release_sock(sk); ... can now process the socket backlog in process context from release_sock(), so all GFP_KERNEL allocations might barf because of TASK_INTERRUPTIBLE being used at that point. sk_wait_event() probably also needs a fix. Peter, any idea how this can be done ? Thanks !
Re: [PATCH net-next 5/5] ipv6: Compute multipath hash for forwarded ICMP errors from offending packet
On Fri, Oct 28, 2016 at 02:25 PM GMT, Tom Herbert wrote: > On Fri, Oct 28, 2016 at 1:32 AM, Jakub Sitnickiwrote: >> On Thu, Oct 27, 2016 at 10:35 PM GMT, Tom Herbert wrote: >>> On Mon, Oct 24, 2016 at 2:28 AM, Jakub Sitnicki wrote: Same as for the transmit path, let's do our best to ensure that received ICMP errors that may be subject to forwarding will be routed the same path as flow that triggered the error, if it was going in the opposite direction. >>> Unfortunately our ability to do this is generally quite limited. This >>> patch will select the route for multipath, but I don't believe sets >>> the same link in LAG and definitely can't help switches doing ECMP to >>> route the ICMP packet in the same way as the flow would be. Did you >>> see a problem that warrants solving this case? >> >> The motivation here is to bring IPv6 ECMP routing on par with IPv4 to >> enable its wider use, targeting anycast services. Forwarding ICMP errors >> back to the source host, at the L3 layer, is what we thought would be a >> step forward. >> >> Similar to change in IPv4 routing introduced in commit 79a131592dbb >> ("ipv4: ICMP packet inspection for multipath", [1]) we do our best at >> L3, leaving any potential problems with LAG at lower layer (L2) >> unaddressed. >> > ICMP will almost certainly take a different path in the network than > TCP or UDP due to ECMP. If we ever get proper flow label support for > ECMP then that could solve the problem if all the devices do a hash > just on . Sorry for my late reply, I have been traveling. I think that either I am missing something here, or the proposed changes address just the problem that you have described. Yes, if we compute the hash that drives the route choice over the IP header of the ICMP error, then there is no guarantee it will travel back to the sender of the offending packet that triggered the error. That is why, we look at the offending packet carried by an ICMP error and hash over its fields, instead. We need, however, to take care of two things: 1) swap the source with the destination address, because we are forwarding the ICMP error in the opposite direction than the offending packet was going (see icmpv6_multipath_hash() introduced in patch 4/5); and 2) ensure the flow labels used in both directions are the same (either reflected by one side, or fixed, e.g. not used and set to 0), so that the 4-tuple we hash over when forwarding, , is the same both ways, modulo the order of addresses. > If this patch is being done to be compatible with IPv4 I guess that's > okay, but it would be false advertisement to say this makes ICMP > follow the same path as the flow being targeted in an error. > Fortunately, I doubt anyone can have a dependency on this for ICMP. I wouldn't want to propose anything that would be useless. If you think that this is the case here, I would very much like to understand what and why cannot work in practice. Thanks for reviewing this series, Jakub
Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
John Heenanwrites: > Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set > macpower, is never 0xea. It is only ever 0x01 (first time after modprobe) > using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results > occurs with 'Fix for authentication failure' [PATCH 1/2] in place. > > Whatever was returned, code tests always showed that at least > rtl8xxxu_init_queue_reserved_page(priv); > is always required. Not called if macpower set to true. > > Please see cover letter, [PATCH 0/2], for more information from tests. > > For rtl8xxxu-devel branch of > git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git > > Signed-off-by: John Heenan > --- > drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c > b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c > index f25b4df..aae05f3 100644 > --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c > +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c > @@ -3904,6 +3904,7 @@ static int rtl8xxxu_init_device(struct ieee80211_hw *hw) > macpower = false; > else > macpower = true; > + macpower = false; // Code testing shows macpower must always be set to > false to avoid failure > > ret = fops->power_on(priv); > if (ret < 0) { Sorry but this patch is neither serious nor acceptable. First of all, hardcoding macpower like this right after an if statement is plain wrong, second your comments violate all kernel rules. Second, you argue this was tested using code test - on which device? Did you test it on all rtl8xxxu based devices or just rtl8723bu? NACK Jes
[PATCH net-next 1/4] route: Set orig_output when redirecting to lwt on locally generated traffic
orig_output for IPv4 was only set for dsts which hit an input route. Set it consistently for locally generated traffic as well to allow lwt to continue the dst_output() path as configured by the nexthop. Fixes: 2536862311d ("lwt: Add support to redirect dst.input") Signed-off-by: Thomas Graf--- net/ipv4/route.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 62d4d90..7da886e 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2138,8 +2138,10 @@ static struct rtable *__mkroute_output(const struct fib_result *res, } rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0); - if (lwtunnel_output_redirect(rth->dst.lwtstate)) + if (lwtunnel_output_redirect(rth->dst.lwtstate)) { + rth->dst.lwtstate->orig_output = rth->dst.output; rth->dst.output = lwtunnel_output; + } return rth; } -- 2.7.4
[PATCH net-next 2/4] route: Set lwtstate for local traffic and cached input dsts
A route on the output path hitting a RTN_LOCAL route will keep the dst associated on its way through the loopback device. On the receive path, the dst_input() call will thus invoke the input handler of the route created in the output path. Thus, lwt redirection for input must be done for dsts allocated in the otuput path as well. Also, if a route is cached in the input path, the allocated dst should respect lwtunnel configuration on the nexthop as well. Signed-off-by: Thomas Graf--- net/ipv4/route.c | 39 ++- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 7da886e..44f5403 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1596,6 +1596,19 @@ static void ip_del_fnhe(struct fib_nh *nh, __be32 daddr) spin_unlock_bh(_lock); } +static void set_lwt_redirect(struct rtable *rth) +{ + if (lwtunnel_output_redirect(rth->dst.lwtstate)) { + rth->dst.lwtstate->orig_output = rth->dst.output; + rth->dst.output = lwtunnel_output; + } + + if (lwtunnel_input_redirect(rth->dst.lwtstate)) { + rth->dst.lwtstate->orig_input = rth->dst.input; + rth->dst.input = lwtunnel_input; + } +} + /* called in rcu_read_lock() section */ static int __mkroute_input(struct sk_buff *skb, const struct fib_result *res, @@ -1685,14 +1698,7 @@ static int __mkroute_input(struct sk_buff *skb, rth->dst.input = ip_forward; rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag); - if (lwtunnel_output_redirect(rth->dst.lwtstate)) { - rth->dst.lwtstate->orig_output = rth->dst.output; - rth->dst.output = lwtunnel_output; - } - if (lwtunnel_input_redirect(rth->dst.lwtstate)) { - rth->dst.lwtstate->orig_input = rth->dst.input; - rth->dst.input = lwtunnel_input; - } + set_lwt_redirect(rth); skb_dst_set(skb, >dst); out: err = 0; @@ -1919,8 +1925,18 @@ out: return err; rth->dst.error= -err; rth->rt_flags &= ~RTCF_LOCAL; } + if (do_cache) { - if (unlikely(!rt_cache_route(_RES_NH(res), rth))) { + struct fib_nh *nh = _RES_NH(res); + + rth->dst.lwtstate = lwtstate_get(nh->nh_lwtstate); + if (lwtunnel_input_redirect(rth->dst.lwtstate)) { + WARN_ON(rth->dst.input == lwtunnel_input); + rth->dst.lwtstate->orig_input = rth->dst.input; + rth->dst.input = lwtunnel_input; + } + + if (unlikely(!rt_cache_route(nh, rth))) { rth->dst.flags |= DST_NOCACHE; rt_add_uncached_list(rth); } @@ -2138,10 +2154,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res, } rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0); - if (lwtunnel_output_redirect(rth->dst.lwtstate)) { - rth->dst.lwtstate->orig_output = rth->dst.output; - rth->dst.output = lwtunnel_output; - } + set_lwt_redirect(rth); return rth; } -- 2.7.4
[PATCH net-next 4/4] bpf: Add samples for LWT-BPF
This adds a set of samples demonstrating the use of lwt-bpf combined with a shell script which allows running the samples in the form of a basic selftest. The samples include: - Allowing all packets - Dropping all packets - Printing context information - Access packet data - IPv4 daddr rewrite in dst_output() - L2 MAC header push + redirect in lwt xmit Signed-off-by: Thomas Graf--- samples/bpf/bpf_helpers.h | 4 + samples/bpf/lwt_bpf.c | 210 +++ samples/bpf/test_lwt_bpf.sh | 337 3 files changed, 551 insertions(+) create mode 100644 samples/bpf/lwt_bpf.c create mode 100755 samples/bpf/test_lwt_bpf.sh diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h index 90f44bd..f34e417 100644 --- a/samples/bpf/bpf_helpers.h +++ b/samples/bpf/bpf_helpers.h @@ -80,6 +80,8 @@ struct bpf_map_def { unsigned int map_flags; }; +static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) = + (void *) BPF_FUNC_skb_load_bytes; static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) = (void *) BPF_FUNC_skb_store_bytes; static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flags) = @@ -88,6 +90,8 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag (void *) BPF_FUNC_l4_csum_replace; static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) = (void *) BPF_FUNC_skb_under_cgroup; +static int (*bpf_skb_push)(void *, int len, int flags) = + (void *) BPF_FUNC_skb_push; #if defined(__x86_64__) diff --git a/samples/bpf/lwt_bpf.c b/samples/bpf/lwt_bpf.c new file mode 100644 index 000..05be6ac --- /dev/null +++ b/samples/bpf/lwt_bpf.c @@ -0,0 +1,210 @@ +/* Copyright (c) 2016 Thomas Graf + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "bpf_helpers.h" +#include + +# define printk(fmt, ...) \ + ({ \ + char fmt[] = fmt; \ + bpf_trace_printk(fmt, sizeof(fmt), \ +##__VA_ARGS__);\ + }) + +#define CB_MAGIC 1234 + +/* Let all packets pass */ +SEC("nop") +int do_nop(struct __sk_buff *skb) +{ + return BPF_OK; +} + +/* Print some context information per packet to tracing buffer. + */ +SEC("ctx_test") +int do_ctx_test(struct __sk_buff *skb) +{ + skb->cb[0] = CB_MAGIC; + printk("len %d hash %d protocol %d\n", skb->len, skb->hash, + skb->protocol); + printk("cb %d ingress_ifindex %d ifindex %d\n", skb->cb[0], + skb->ingress_ifindex, skb->ifindex); + + return BPF_OK; +} + +/* Print content of skb->cb[] to tracing buffer */ +SEC("print_cb") +int do_print_cb(struct __sk_buff *skb) +{ + printk("cb0: %x cb1: %x cb2: %x\n", skb->cb[0], skb->cb[1], + skb->cb[2]); + printk("cb3: %x cb4: %x\n", skb->cb[3], skb->cb[4]); + + return BPF_OK; +} + +/* Print source and destination IPv4 address to tracing buffer */ +SEC("data_test") +int do_data_test(struct __sk_buff *skb) +{ + void *data = (void *)(long)skb->data; + void *data_end = (void *)(long)skb->data_end; + struct iphdr *iph = data; + + if (data + sizeof(*iph) > data_end) { + printk("packet truncated\n"); + return BPF_DROP; + } + + printk("src: %x dst: %x\n", iph->saddr, iph->daddr); + + return BPF_OK; +} + +#define IP_CSUM_OFF offsetof(struct iphdr, check) +#define IP_DST_OFF offsetof(struct iphdr, daddr) +#define IP_SRC_OFF offsetof(struct iphdr, saddr) +#define IP_PROTO_OFF offsetof(struct iphdr, protocol) +#define TCP_CSUM_OFF offsetof(struct tcphdr, check) +#define UDP_CSUM_OFF offsetof(struct udphdr, check) +#define IS_PSEUDO 0x10 + +static inline int rewrite(struct __sk_buff *skb, uint32_t old_ip, + uint32_t new_ip, int rw_daddr) +{ + int ret, off = 0, flags = IS_PSEUDO; + uint8_t proto; + + ret = bpf_skb_load_bytes(skb, IP_PROTO_OFF, , 1); + if (ret < 0) { + printk("bpf_l4_csum_replace failed: %d\n", ret); + return BPF_DROP; + } + + switch (proto) { + case IPPROTO_TCP: +
[PATCH net-next 3/4] bpf: BPF for lightweight tunnel encapsulation
Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a LWT redirection of type LWTUNNEL_ENCAP_BPF. The separate program types are required because manipulation of packet data is only allowed on the output and transmit path as the subsequent dst_input() call path assumes an IP header validated by ip_rcv(). The BPF programs will be handed an skb with the L3 header attached and may return one of the following return codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid on lwtunnel_xmit() hook) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. A new helper bpf_skb_push() is added which allows to preprend an L2 header in front of the skb, extend the existing L3 header, or both. This allows to address a wide range of issues: - Optimize L2 header construction when L2 information is always static to avoid ARP/NDisc lookup. - Extend IP header to add additional IP options. - Perform simple encapsulation where offload is of no concern. (The existing funtionality to attach a tunnel key to the skb and redirect to a tunnel net_device to allow for offload continues to work obviously). Signed-off-by: Thomas Graf--- include/linux/filter.h| 2 +- include/uapi/linux/bpf.h | 31 +++- include/uapi/linux/lwtunnel.h | 21 +++ kernel/bpf/verifier.c | 16 +- net/core/Makefile | 2 +- net/core/filter.c | 148 - net/core/lwt_bpf.c| 365 ++ net/core/lwtunnel.c | 1 + 8 files changed, 579 insertions(+), 7 deletions(-) create mode 100644 net/core/lwt_bpf.c diff --git a/include/linux/filter.h b/include/linux/filter.h index 1f09c52..aad7f81 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -438,7 +438,7 @@ struct xdp_buff { }; /* compute the linear packet data range [data, data_end) which - * will be accessed by cls_bpf and act_bpf programs + * will be accessed by cls_bpf, act_bpf and lwt programs */ static inline void bpf_compute_data_end(struct sk_buff *skb) { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e2f38e0..2ebaa3c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -96,6 +96,9 @@ enum bpf_prog_type { BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_PERF_EVENT, + BPF_PROG_TYPE_LWT_IN, + BPF_PROG_TYPE_LWT_OUT, + BPF_PROG_TYPE_LWT_XMIT, }; #define BPF_PSEUDO_MAP_FD 1 @@ -383,6 +386,16 @@ union bpf_attr { * * int bpf_get_numa_node_id() * Return: Id of current NUMA node. + * + * int bpf_skb_push() + * Add room to beginning of skb and adjusts MAC header offset accordingly. + * Extends/reallocaes for needed skb headeroom automatically. + * May change skb data pointer and will thus invalidate any check done + * for direct packet access. + * @skb: pointer to skb + * @len: length of header to be pushed in front + * @flags: Flags (unused for now) + * Return: 0 on success or negative error */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -427,7 +440,8 @@ union bpf_attr { FN(skb_pull_data), \ FN(csum_update),\ FN(set_hash_invalid), \ - FN(get_numa_node_id), + FN(get_numa_node_id), \ + FN(skb_push), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -511,6 +525,21 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +/* Generic BPF return codes which all BPF program types may support. + * The values are binary compatible with their TC_ACT_* counter-part to + * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT + * programs. + * + * XDP is handled seprately, see XDP_*. + */ +enum bpf_ret_code { + BPF_OK = 0, + /* 1 reserved */ + BPF_DROP = 2, + /* 3-6 reserved */ + BPF_REDIRECT = 7, +}; + /* User return codes for XDP prog type. * A valid XDP program must return one of these defined values. All other * return codes are reserved for future use. Unknown return codes will result diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index a478fe8..9354d997 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -9,6 +9,7 @@ enum lwtunnel_encap_types { LWTUNNEL_ENCAP_IP, LWTUNNEL_ENCAP_ILA, LWTUNNEL_ENCAP_IP6, + LWTUNNEL_ENCAP_BPF, __LWTUNNEL_ENCAP_MAX, }; @@ -42,4 +43,24 @@ enum lwtunnel_ip6_t { #define LWTUNNEL_IP6_MAX (__LWTUNNEL_IP6_MAX - 1) +enum { + LWT_BPF_PROG_UNSPEC, + LWT_BPF_PROG_FD, +
[PATCH net-next 0/4] BPF for lightweight tunnel encapsulation
This series implements BPF program invocation from dst entries via the lightweight tunnels infrastructure. The BPF program can be attached to lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and sees an L3 skb as context. input is read-only, output can write, xmit can write, push headers, and redirect. Motiviation for this work: - Restricting outgoing routes beyond what the route tuple supports - Per route accounting byond realms - Fast attachment of L2 headers where header does not require resolving L2 addresses - ILA like uses cases where L3 addresses are resolved and then routed in an async manner - Fast encapsulation + redirect. For now limited to use cases where not setting inner and outer offset/protocol is OK. A couple of samples on how to use it can be found in patch 04. Thomas Graf (4): route: Set orig_output when redirecting to lwt on locally generated traffic route: Set lwtstate for local traffic and cached input dsts bpf: BPF for lightweight tunnel encapsulation bpf: Add samples for LWT-BPF include/linux/filter.h| 2 +- include/uapi/linux/bpf.h | 31 +++- include/uapi/linux/lwtunnel.h | 21 +++ kernel/bpf/verifier.c | 16 +- net/core/Makefile | 2 +- net/core/filter.c | 148 - net/core/lwt_bpf.c| 365 ++ net/core/lwtunnel.c | 1 + net/ipv4/route.c | 37 +++-- samples/bpf/bpf_helpers.h | 4 + samples/bpf/lwt_bpf.c | 210 samples/bpf/test_lwt_bpf.sh | 337 ++ 12 files changed, 1156 insertions(+), 18 deletions(-) create mode 100644 net/core/lwt_bpf.c create mode 100644 samples/bpf/lwt_bpf.c create mode 100755 samples/bpf/test_lwt_bpf.sh -- 2.7.4
RFC if==else in halbtc8723b1ant.c
Hi ! in your commit f5b586909581 ("rtlwifi: btcoexist: Modify driver to support BT coexistence in rtl8723be") you introduced a if/else where both branches are the same but the comment in the else branch suggests that this might be unintended. from code review only I can´t say what the intent is. /drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b1ant.c:halbtc8723b1ant_action_wifi_connected_bt_acl_busy() 1838 if ((bt_rssi_state == BTC_RSSI_STATE_HIGH) || 1839 (bt_rssi_state == BTC_RSSI_STATE_STAY_HIGH)) { 1840 halbtc8723b1ant_ps_tdma(btcoexist, NORMAL_EXEC, 1841 true, 14); 1842 coex_dm->auto_tdma_adjust = false; 1843 } else { /*for low BT RSSI*/ 1844 halbtc8723b1ant_ps_tdma(btcoexist, NORMAL_EXEC, 1845 true, 14); 1846 coex_dm->auto_tdma_adjust = false; 1847 } basically the same construct is also in halbtc8723b1ant_run_coexist_mechanism() 2213 if ((wifi_rssi_state == BTC_RSSI_STATE_HIGH) || 2214 (wifi_rssi_state == BTC_RSSI_STATE_STAY_HIGH)) { 2215 halbtc8723b1ant_limited_tx(btcoexist, 2216NORMAL_EXEC, 22171, 1, 1, 1); 2218 } else { 2219 halbtc8723b1ant_limited_tx(btcoexist, 2220NORMAL_EXEC, 22211, 1, 1, 1); } where the if condition is the same so the else may also only apply to the low BT RSSI - and the if and else are again the same - if this is intended or not is not clear. If this is intended it should have appropriate comments. thx! hofrat
Re: [bnx2] [Regression 4.8] Driver loading fails without firmware
Dear Baoquan, Am Samstag, den 29.10.2016, 10:55 +0800 schrieb Baoquan He: > On 10/27/16 at 03:21pm, Paul Menzel wrote: > > > > Baoquan, could you please fix this regression. My suggestion is, that > > > > you > > > > add the old code back, but check if the firmware has been loaded. If it > > > > hasn’t, load it again. > > > > > > > > That way, people can update their Linux kernel, and it continues working > > > > without changing the initramfs, or anything else. > > > > > > I saw your mail but I am also not familiar with bnx2 driver. As the > > > commit log says I just tried to make bnx2 driver reset itself earlier. > > > > > > So you did a git bisect and found this commit caused the regression, > > > right? If yes, and network developers have no action, I will look into > > > the code and see if I have idea to fix it. > > > > Well, I looked through the commits and found that one, which would explain > > the changed behavior. > > > > To be sure, and to follow your request, I took Linux 4.8.4 and reverted your > > commit (attached). Then I deleted the firmware again from the initramfs, and > > rebooted. The devices showed up just fine as before. > > > > So to summarize, the commit is indeed the culprit. > Sorry for this. > > Could you tell the steps to reproduce? I will find a machine with bnx2 > NIC and check if there's other ways. Well, delete the bnx2 firmware files from the initramfs, and start the system. Did you read my proposal, to try to load the firmware twice, that means, basically revert only the deleted lines of your commit, and add an additional check? Kind regards, Paul
Re: Let's do P4
On 10/30/16 at 08:44am, Jiri Pirko wrote: > Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: > >On 16-10-29 07:49 AM, Jakub Kicinski wrote: > >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote: > >>> Hi all. > >>> > >>> The network world is divided into 2 general types of hw: > >>> 1) network ASICs - network specific silicon, containing things like TCAM > >>>These ASICs are suitable to be programmed by P4. > >>> 2) network processors - basically a general purpose CPUs > >>>These processors are suitable to be programmed by eBPF. > >>> > >>> I believe that by now, the most people came to a conclusion that it is > >>> very difficult to handle both types by either P4 or eBPF. And since > >>> eBPF is part of the kernel, I would like to introduce P4 into kernel > >>> as well. Here's a plan: > >>> > >>> 1) Define P4 intermediate representation > >>>I cannot imagine loading P4 program (c-like syntax text file) into > >>>kernel as is. That means that as the first step, we need find some > >>>intermediate representation. I can imagine someting in a form of AST, > >>>call it "p4ast". I don't really know how to do this exactly though, > >>>it's just an idea. > >>> > >>>In the end there would be a userspace precompiler for this: > >>>$ makep4ast example.p4 example.ast > >> > >> Maybe stating the obvious, but IMHO defining the IR is the hardest part. > >> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF. The > >> AST/IR for switch pipelines should allow for similar flexibility. > >> Looser coupling would also protect us from changes in spec of the high > >> level language. My assumption was that a new IR is defined which is easier to parse than eBPF which is targeted at execution on a CPU and not indented for pattern matching. Just looking at how llvm creates different patterns and reorders instructions, I'm not seeing how eBPF can serve as a general purpose IR if the objective is to allow fairly flexible generation of the bytecode. Hence the alternative IR serving as additional metadata complementing the eBPF program. > >Jumping in the middle here. You managed to get an entire thread going > >before I even woke up :) > > > >The problem with eBPF as an IR is that in the universe of eBPF IR > >programs the subset that can be offloaded onto a standard ASIC based > >hardware (non NPU/FPGA/etc) is so small to be almost meaningless IMO. > > > >I tried this for awhile and the result is users have to write very > >targeted eBPF that they "know" will be pattern matched and pushed into > >an ASIC. It can work but its very fragile. When I did this I ended up > >with an eBPF generator for deviceX and an eBPF generator for deviceY > >each with a very specific pattern matching engine in the driver to > >xlate ebpf-deviceX into its asic. Existing ASICs for example usually > >support only one pipeline, only one parser (or require moving mountains > >to change the parse via ucode), only one set of tables, and only one > >deparser/serailizer at the end to build the new packet. Next-gen pieces > >may have some flexibility on the parser side. > > > >There is an interesting resource allocation problem we have that could > >be solved by p4 or devlink where in we want to pre-allocate slices of > >the TCAM for certain match types. I was planning on writing devlink code > >for this because its primarily done at initialization once. > > There are 2 resource allocation problems in our hw. One is general > division ot the resources in feature-chunks. That needs to be done > during the ASIC initialization phase. For that, I also plan to utilize > devlink API. > > The second one is runtime allocation of tables, and that would be > handled by p4 just fine. > > > > > >I will note one nice thing about using eBPF however is that you have an > >easy software emulation path via ebpf engine in kernel. > > > >... And merging threads here with Jiri's email ... > > > >> If you do p4>ebpf in userspace, you have 2 apis: > >> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel > >> 2) to setup hw p4 datapath, you push program.p4ast to kernel > >> > >> Those are 2 apis. Both wrapped up by TC, but still 2 apis. > >> > >> What I believe is correct is to have one api: > >> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel > >> 2) to setup hw p4 datapath, you push program.p4ast to kernel I understand what you mean with two APIs now. You want a single IR block and divide the SW/HW part in the kernel rather than let llvm or something else do it. > >Couple comments around this, first adding yet another IR in the kernel > >and another JIT engine to map that IR on to eBPF or hardware vendor X > >doesn't get me excited. Its really much easier to write these as backend > >objects in LLVM. Not saying it can't be done just saying it is easier > >in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF. > >In the end this would be a
[PATCH 1/2] rtl8xxxu: Fix for authentication failure
This fix enables the same sequence of init behaviour as the alternative working driver for the wireless rtl8723bu IC at https://github.com/lwfinger/rtl8723bu For exampe rtl8xxxu_init_device is now called each time userspace wpa_supplicant is executed instead of just once when modprobe is executed. Along with 'Fix for bogus data used to determine macpower', wpa_supplicant now reliably and successfully authenticates. For rtl8xxxu-devel branch of git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git Signed-off-by: John Heenan--- drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c index 04141e5..f25b4df 100644 --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c @@ -5779,6 +5779,11 @@ static int rtl8xxxu_start(struct ieee80211_hw *hw) ret = 0; + ret = rtl8xxxu_init_device(hw); + if (ret) + goto error_out; + + init_usb_anchor(>rx_anchor); init_usb_anchor(>tx_anchor); init_usb_anchor(>int_anchor); @@ -6080,10 +6085,6 @@ static int rtl8xxxu_probe(struct usb_interface *interface, goto exit; } - ret = rtl8xxxu_init_device(hw); - if (ret) - goto exit; - hw->wiphy->max_scan_ssids = 1; hw->wiphy->max_scan_ie_len = IEEE80211_MAX_DATA_LEN; hw->wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION); -- 2.10.1
[PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set macpower, is never 0xea. It is only ever 0x01 (first time after modprobe) using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results occurs with 'Fix for authentication failure' [PATCH 1/2] in place. Whatever was returned, code tests always showed that at least rtl8xxxu_init_queue_reserved_page(priv); is always required. Not called if macpower set to true. Please see cover letter, [PATCH 0/2], for more information from tests. For rtl8xxxu-devel branch of git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git Signed-off-by: John Heenan--- drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c index f25b4df..aae05f3 100644 --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c @@ -3904,6 +3904,7 @@ static int rtl8xxxu_init_device(struct ieee80211_hw *hw) macpower = false; else macpower = true; + macpower = false; // Code testing shows macpower must always be set to false to avoid failure ret = fops->power_on(priv); if (ret < 0) { -- 2.10.1
[PATCH 0/2] rtl8xxxu: Fix allows wpa_supplicant to authenticate
With the current kernel release, wpa_supplicant results in authentication failure with a Cube i9 tablet (a Surface Pro like device): Successfully initialized wpa_supplicant wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="localre" auth_failures=1 duration=10 reason=CONN_FAILED There is a workaround: that ONLY works once per invocation of wpa_supplicant: rmmod rtl8xxxu modprobe rtl8xxxu The follwing two patches result in reliable behaviour, without a workaround, of wpa_supplicant: Successfully initialized wpa_supplicant wlp0s20f0u7i2: SME: Trying to authenticate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: Trying to associate with 10:fe:ed:62:7a:78 (SSID='localre' freq=2417 MHz) wlp0s20f0u7i2: Associated with 10:fe:ed:62:7a:78 wlp0s20f0u7i2: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlp0s20f0u7i2: WPA: Key negotiation completed with 10:fe:ed:62:7a:78 [PTK=CCMP GTK=CCMP] wlp0s20f0u7i2: CTRL-EVENT-CONNECTED - Connection to 10:fe:ed:62:7a:78 completed [id=0 id_str=] The patches are for kernel tree: git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git branch: rtl8xxxu-devel The first patch moves init code so that each time wpa_supplicant is invoked there is similar init behaviour as the alternative working rtl8xxxu driver https://github.com/lwfinger/rtl8723bu The second patch enables more complete initialisation to occur. There are three issues: 1. The value returned by "rtl8xxxu_read8(priv, REG_CR);", to set macpower, is never 0xef. The value is either 0x01 (first time with wpa_supplcant after modprobe) or 0x00 (re executing wpa_supplicant) 2. Trying to use the value 0x00 or 0x01 retutned to determine macpower setting always resulted in failure 3. At the very least 'rtl8xxxu_init_queue_reserved_page(priv);' must always be invoked, even if not all of the extra init sequence arising from setting macpower to false is run. Patched code with a suitable Makefile will be available from https://github.com/johnheenan/rtl8xxxu for testing by Cube i9 owners John Heenan (2): rtl8xxxu: Fix for authentication failure rtl8xxxu: Fix for bogus data used to determine macpower drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) -- 2.10.1
Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
On Fri, Oct 28, 2016 at 7:53 PM, David Millerwrote: > > I really disalike pull requests of this form. > > You add lots of datastructures and helper functions but no actual > users of these facilities to the driver. > > Do this instead: > > 1) Add TSAR infrastructure > 2) Add use of TSAR facilities to the driver > > That's one pull request. > > I don't care if this is hard, or if there are entanglements with > Infiniband or whatever, you must submit changes in this manner. > It is not hard, it is just not right, we have lots of IB and ETH features that we would like to submit in the same kernel cycle, with your suggestion I will have to almost submit every feature (core infrastructure and netdev/RDMA usage) to you and Doug. Same for rdma features, you will receive PULL request for them as well, I am sure you and the netdev list don't need such noise. do not forget that this will slow down mlx5 progress since netde will block rdma and vise-versa. > I will not accept additions to a driver that don't even get really > used. For logic/helper functions containing patches such as "Add TSAR infrastructure" I agree and i can find a way to move some code around to avoid future conflicts and remove them from such pull requests. but you need to at least accept hardware related structures infrastructure patches for shared code such as include/linux/mlx5/mlx5_ifc.h where we have only hardware definitions and those patches are really minimal. So bottom line, I will do my best to ensure future PULL requests will contain only include/linux/mlx5/*.h hardware related definitions or fully implemented features. Can we agree on that ? Thanks, Saeed.
pull-request: wireless-drivers-next 2016-10-30
Hi Dave, few fixes for 4.9. I tagged this on the plane over a slow mosh connection while travelling to Plumbers so I might have done something wrong, please check more carefully than usually. For example I had to redo the signed tag because of some whitespace damage. Please let me know if there are any problems. Kalle The following changes since commit 67f0160fe34ec5391a428603b9832c9f99d8f3a1: MAINTAINERS: Update qlogic networking drivers (2016-10-26 23:29:12 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2016-10-30 for you to fetch changes up to d3532ea6ce4ea501e421d130555e59edc2945f99: brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap (2016-10-27 18:04:54 +0300) wireless-drivers fixes for 4.9 iwlwifi * some fixes for suspend/resume with unified FW images * a fix for a false-positive lockdep report * a fix for multi-queue that caused an unnecessary 1 second latency * a fix for an ACPI parsing bug that caused a misleading error message brcmfmac * fix a variable uninitialised warning in brcmf_cfg80211_start_ap() Arnd Bergmann (1): brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap Haim Dreyfuss (1): iwlwifi: mvm: comply with fw_restart mod param on suspend Johannes Berg (1): iwlwifi: pcie: mark command queue lock with separate lockdep class Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2015-10-25' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes Luca Coelho (4): iwlwifi: mvm: use ssize_t for len in iwl_debugfs_mem_read() iwlwifi: mvm: fix d3_test with unified D0/D3 images iwlwifi: pcie: fix SPLC structure parsing iwlwifi: mvm: fix netdetect starting/stopping for unified images Sara Sharon (1): iwlwifi: mvm: wake the wait queue when the RX sync counter is zero .../broadcom/brcm80211/brcmfmac/cfg80211.c |2 +- drivers/net/wireless/intel/iwlwifi/mvm/d3.c| 49 +--- drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c |4 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c |3 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h |1 + drivers/net/wireless/intel/iwlwifi/mvm/ops.c |1 + drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c |3 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 33 ++-- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 79 drivers/net/wireless/intel/iwlwifi/pcie/tx.c |8 ++ 10 files changed, 129 insertions(+), 54 deletions(-)
[patch net] mlxsw: spectrum: Fix incorrect reuse of MID entries
From: Ido SchimmelIn the device, a MID entry represents a group of local ports, which can later be bound to a MDB entry. The lookup of an existing MID entry is currently done using the provided MC MAC address and VID, from the Linux bridge. However, this can result in an incorrect reuse of the same MID index in different VLAN-unaware bridges (same IP MC group and VID 0). Fix this by performing the lookup based on FID instead of VID, which is unique across different bridges. Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support") Signed-off-by: Ido Schimmel Acked-by: Elad Raz Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index 9b22863..97bbc1d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -115,7 +115,7 @@ struct mlxsw_sp_rif { struct mlxsw_sp_mid { struct list_head list; unsigned char addr[ETH_ALEN]; - u16 vid; + u16 fid; u16 mid; unsigned int ref_count; }; diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 5e00c79..1e2c8ec 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -929,12 +929,12 @@ static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 mid, static struct mlxsw_sp_mid *__mlxsw_sp_mc_get(struct mlxsw_sp *mlxsw_sp, const unsigned char *addr, - u16 vid) + u16 fid) { struct mlxsw_sp_mid *mid; list_for_each_entry(mid, _sp->br_mids.list, list) { - if (ether_addr_equal(mid->addr, addr) && mid->vid == vid) + if (ether_addr_equal(mid->addr, addr) && mid->fid == fid) return mid; } return NULL; @@ -942,7 +942,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_get(struct mlxsw_sp *mlxsw_sp, static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp, const unsigned char *addr, - u16 vid) + u16 fid) { struct mlxsw_sp_mid *mid; u16 mid_idx; @@ -958,7 +958,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp, set_bit(mid_idx, mlxsw_sp->br_mids.mapped); ether_addr_copy(mid->addr, addr); - mid->vid = vid; + mid->fid = fid; mid->mid = mid_idx; mid->ref_count = 0; list_add_tail(>list, _sp->br_mids.list); @@ -991,9 +991,9 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port *mlxsw_sp_port, if (switchdev_trans_ph_prepare(trans)) return 0; - mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, mdb->vid); + mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, fid); if (!mid) { - mid = __mlxsw_sp_mc_alloc(mlxsw_sp, mdb->addr, mdb->vid); + mid = __mlxsw_sp_mc_alloc(mlxsw_sp, mdb->addr, fid); if (!mid) { netdev_err(dev, "Unable to allocate MC group\n"); return -ENOMEM; @@ -1137,7 +1137,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port *mlxsw_sp_port, u16 mid_idx; int err = 0; - mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, mdb->vid); + mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, fid); if (!mid) { netdev_err(dev, "Unable to remove port from MC DB\n"); return -EINVAL; -- 2.5.5
[PATCH net] qede: Fix statistics' strings for Tx/Rx queues
When an interface is configured to use Tx/Rx-only queues, the length of the statistics would be shortened to accomodate only the statistics required per-each queue, and the values would be provided accordingly. However, the strings provided would still contain both Tx and Rx strings for each one of the queues [regardless of its configuration], which might lead to out-of-bound access when filling the buffers as well as incorrect statistics presented. Fixes: 9a4d7e86acf3 ("qede: Add support for Tx/Rx-only queues.") Signed-off-by: Yuval Mintz--- Hi Dave, Please consider applying this to `net'. Thanks, Yuval --- drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c index d230742..8c2bbb2 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -177,16 +177,23 @@ static void qede_get_strings_stats(struct qede_dev *edev, u8 *buf) for (i = 0, k = 0; i < QEDE_QUEUE_CNT(edev); i++) { int tc; - for (j = 0; j < QEDE_NUM_RQSTATS; j++) - sprintf(buf + (k + j) * ETH_GSTRING_LEN, - "%d: %s", i, qede_rqstats_arr[j].string); - k += QEDE_NUM_RQSTATS; - for (tc = 0; tc < edev->num_tc; tc++) { - for (j = 0; j < QEDE_NUM_TQSTATS; j++) + if (edev->fp_array[i].type & QEDE_FASTPATH_RX) { + for (j = 0; j < QEDE_NUM_RQSTATS; j++) sprintf(buf + (k + j) * ETH_GSTRING_LEN, - "%d.%d: %s", i, tc, - qede_tqstats_arr[j].string); - k += QEDE_NUM_TQSTATS; + "%d: %s", i, + qede_rqstats_arr[j].string); + k += QEDE_NUM_RQSTATS; + } + + if (edev->fp_array[i].type & QEDE_FASTPATH_TX) { + for (tc = 0; tc < edev->num_tc; tc++) { + for (j = 0; j < QEDE_NUM_TQSTATS; j++) + sprintf(buf + (k + j) * + ETH_GSTRING_LEN, + "%d.%d: %s", i, tc, + qede_tqstats_arr[j].string); + k += QEDE_NUM_TQSTATS; + } } } -- 1.9.3
Re: Let's do P4
Sat, Oct 29, 2016 at 06:46:21PM CEST, john.fastab...@gmail.com wrote: >On 16-10-29 07:49 AM, Jakub Kicinski wrote: >> On Sat, 29 Oct 2016 09:53:28 +0200, Jiri Pirko wrote: >>> Hi all. >>> >>> The network world is divided into 2 general types of hw: >>> 1) network ASICs - network specific silicon, containing things like TCAM >>>These ASICs are suitable to be programmed by P4. >>> 2) network processors - basically a general purpose CPUs >>>These processors are suitable to be programmed by eBPF. >>> >>> I believe that by now, the most people came to a conclusion that it is >>> very difficult to handle both types by either P4 or eBPF. And since >>> eBPF is part of the kernel, I would like to introduce P4 into kernel >>> as well. Here's a plan: >>> >>> 1) Define P4 intermediate representation >>>I cannot imagine loading P4 program (c-like syntax text file) into >>>kernel as is. That means that as the first step, we need find some >>>intermediate representation. I can imagine someting in a form of AST, >>>call it "p4ast". I don't really know how to do this exactly though, >>>it's just an idea. >>> >>>In the end there would be a userspace precompiler for this: >>>$ makep4ast example.p4 example.ast >> >> Maybe stating the obvious, but IMHO defining the IR is the hardest part. >> eBPF *is* the IR, we can compile C, P4 or even JIT Lua to eBPF. The >> AST/IR for switch pipelines should allow for similar flexibility. >> Looser coupling would also protect us from changes in spec of the high >> level language. >> > >Jumping in the middle here. You managed to get an entire thread going >before I even woke up :) > >The problem with eBPF as an IR is that in the universe of eBPF IR >programs the subset that can be offloaded onto a standard ASIC based >hardware (non NPU/FPGA/etc) is so small to be almost meaningless IMO. > >I tried this for awhile and the result is users have to write very >targeted eBPF that they "know" will be pattern matched and pushed into >an ASIC. It can work but its very fragile. When I did this I ended up >with an eBPF generator for deviceX and an eBPF generator for deviceY >each with a very specific pattern matching engine in the driver to >xlate ebpf-deviceX into its asic. Existing ASICs for example usually >support only one pipeline, only one parser (or require moving mountains >to change the parse via ucode), only one set of tables, and only one >deparser/serailizer at the end to build the new packet. Next-gen pieces >may have some flexibility on the parser side. > >There is an interesting resource allocation problem we have that could >be solved by p4 or devlink where in we want to pre-allocate slices of >the TCAM for certain match types. I was planning on writing devlink code >for this because its primarily done at initialization once. There are 2 resource allocation problems in our hw. One is general division ot the resources in feature-chunks. That needs to be done during the ASIC initialization phase. For that, I also plan to utilize devlink API. The second one is runtime allocation of tables, and that would be handled by p4 just fine. > >I will note one nice thing about using eBPF however is that you have an >easy software emulation path via ebpf engine in kernel. > >... And merging threads here with Jiri's email ... > >> If you do p4>ebpf in userspace, you have 2 apis: >> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel >> 2) to setup hw p4 datapath, you push program.p4ast to kernel >> >> Those are 2 apis. Both wrapped up by TC, but still 2 apis. >> >> What I believe is correct is to have one api: >> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel >> 2) to setup hw p4 datapath, you push program.p4ast to kernel >> > >Couple comments around this, first adding yet another IR in the kernel >and another JIT engine to map that IR on to eBPF or hardware vendor X >doesn't get me excited. Its really much easier to write these as backend >objects in LLVM. Not saying it can't be done just saying it is easier >in LLVM. Also we already have the LLVM code for P4 to LLVM-IR to eBPF. >In the end this would be a reasonably complex bit of code in >the kernel only for hardware offload. I have doubts that folks would >ever use it for software only cases. I'm happy to admit I'm wrong here >though. Well for hw offload, every driver has to parse the IR (whatever will it be in) and program HW accordingly. Similar parsing and translation would be needed for SW path, to translate into eBPF. I don't think it would be more complex than in the drivers. Should be fine. > >So yes using llvm backends creates two paths a hardware mgmt and sw >path but in the hardware + software case typical on the edge the >orchestration and management planes have started to manage the hardware >and software as two blocks of logic for performance SLA logic. Even on >the edge it seems in most cases folks are selling SR-IOV ports and >can't fall back