[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue
Hi Przemek, Thank you for your response, It's really provided us breakthrough. After setting up DPDK on compute node for stable/kilo, Trying to set up Openstack stable/liberty all-in-one setup, At present not able to get the IP allocation for the vhost type instances through DHCP. Also tried assigning IP's manually to them but the inter-VM communication also not happening, #neutron agent-list root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list +--++---+---++---+ | id ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | agent_type ? ? ? ? | host ? ? ? ? ? ? ?| alive | admin_state_up | binary ? ? ? ? ? ? ? ? ? ?| +--++---+---++---+ | 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent ? ? | nfv-dpdk-devstack | :-) ? | True ? ? ? ? ? | neutron-openvswitch-agent | | 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent ? ? ? ? ? | nfv-dpdk-devstack | :-) ? | True ? ? ? ? ? | neutron-l3-agent ? ? ? ? ?| | 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent ? ? ? ? | nfv-dpdk-devstack | :-) ? | True ? ? ? ? ? | neutron-dhcp-agent ? ? ? ?| | b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | nfv-dpdk-devstack | :-) ? | True ? ? ? ? ? | neutron-linuxbridge-agent | | c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent ? ? | nfv-dpdk-devstack | :-) ? | True ? ? ? ? ? | neutron-metadata-agent ? ?| | f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | nfv-dpdk-devstack | xxx ? | True ? ? ? ? ? | neutron-openvswitch-agent | +--++---+---++---+ ovs-vsctl show output# Bridge br-dpdk ? ? ? ? Port br-dpdk ? ? ? ? ? ? Interface br-dpdk ? ? ? ? ? ? ? ? type: internal ? ? ? ? Port phy-br-dpdk ? ? ? ? ? ? Interface phy-br-dpdk ? ? ? ? ? ? ? ? type: patch ? ? ? ? ? ? ? ? options: {peer=int-br-dpdk} ? ? Bridge br-int ? ? ? ? fail_mode: secure ? ? ? ? Port "vhufa41e799-f2" ? ? ? ? ? ? tag: 5 ? ? ? ? ? ? Interface "vhufa41e799-f2" ? ? ? ? ? ? ? ? type: dpdkvhostuser ? ? ? ? Port int-br-dpdk ? ? ? ? ? ? Interface int-br-dpdk ? ? ? ? ? ? ? ? type: patch ? ? ? ? ? ? ? ? options: {peer=phy-br-dpdk} ? ? ? ? Port "tap4e19f8e1-59" ? ? ? ? ? ? tag: 5 ? ? ? ? ? ? Interface "tap4e19f8e1-59" ? ? ? ? ? ? ? ? type: internal ? ? ? ? Port "vhu05734c49-3b" ? ? ? ? ? ? tag: 5 ? ? ? ? ? ? Interface "vhu05734c49-3b" ? ? ? ? ? ? ? ? type: dpdkvhostuser ? ? ? ? Port "vhu10c06b4d-84" ? ? ? ? ? ? tag: 5 ? ? ? ? ? ? Interface "vhu10c06b4d-84" ? ? ? ? ? ? ? ? type: dpdkvhostuser ? ? ? ? Port patch-tun ? ? ? ? ? ? Interface patch-tun ? ? ? ? ? ? ? ? type: patch ? ? ? ? ? ? ? ? options: {peer=patch-int} ? ? ? ? Port "vhue169c581-ef" ? ? ? ? ? ? tag: 5 ? ? ? ? ? ? Interface "vhue169c581-ef" ? ? ? ? ? ? ? ? type: dpdkvhostuser ? ? ? ? Port br-int ? ? ? ? ? ? Interface br-int ? ? ? ? ? ? ? ? type: internal ? ? Bridge br-tun ? ? ? ? fail_mode: secure ? ? ? ? Port br-tun ? ? ? ? ? ? Interface br-tun ? ? ? ? ? ? ? ? type: internal ? ? ? ? ? ? ? ? error: "could not open network device br-tun (Invalid argument)" ? ? ? ? Port patch-int ? ? ? ? ? ? Interface patch-int ? ? ? ? ? ? ? ? type: patch ? ? ? ? ? ? ? ? options: {peer=patch-tun} ? ? ovs_version: "2.4.0" ovs-ofctl dump-flows br-int# root at nfv-dpdk-devstack:/etc/neutron# ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): ?cookie=0xaaa002bb2bcf827b, duration=2410.012s, table=0, n_packets=0, n_bytes=0, idle_age=2410, priority=10,icmp6,in_port=43,icmp_type=136 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2409.480s, table=0, n_packets=0, n_bytes=0, idle_age=2409, priority=10,icmp6,in_port=44,icmp_type=136 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2408.704s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=45,icmp_type=136 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2408.155s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=42,icmp_type=136 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2409.858s, table=0, n_packets=0, n_bytes=0, idle_age=2409, priority=10,arp,in_port=43 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2409.314s, table=0, n_packets=0, n_bytes=0, idle_age=2409, priority=10,arp,in_port=44 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2408.564s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,arp,in_port=45 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2408.019s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,arp,in_port=42 actions=resubmit(,24) ?cookie=0xaaa002bb2bcf827b, duration=2411.53
[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request
Interrupt based request of PF reset from PF is supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Signed-off-by: Jingjing Wu --- doc/guides/rel_notes/release_2_3.rst | 1 + drivers/net/i40e/i40e_ethdev_vf.c| 274 +++ lib/librte_ether/rte_ethdev.h| 1 + 3 files changed, 246 insertions(+), 30 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..73d5f76 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,7 @@ DPDK Release 2.3 New Features +* **Added pf reset event reported in i40e vf PMD driver. Resolved Issues --- diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 64e6957..1ffe64e 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -74,8 +74,6 @@ #define I40EVF_BUSY_WAIT_DELAY 10 #define I40EVF_BUSY_WAIT_COUNT 50 #define MAX_RESET_WAIT_CNT 20 -/*ITR index for NOITR*/ -#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3 struct i40evf_arq_msg_info { enum i40e_virtchnl_ops ops; @@ -151,6 +149,9 @@ static int i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev, + uint8_t *msg, + uint16_t msglen); /* Default hash key buffer for RSS */ static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1]; @@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct vf_cmd_info *args) return err; } - do { - /* Delay some time first */ - rte_delay_ms(ASQ_DELAY_MS); - ret = i40evf_read_pfmsg(dev, &info); - if (ret == I40EVF_MSG_CMD) { - err = 0; - break; - } else if (ret == I40EVF_MSG_ERR) { - err = -1; - break; - } - /* If don't read msg or read sys event, continue */ - } while (i++ < MAX_TRY_TIMES); - _clear_cmd(vf); + switch (args->ops) { + case I40E_VIRTCHNL_OP_RESET_VF: + /*no need to process in this function */ + break; + case I40E_VIRTCHNL_OP_VERSION: + case I40E_VIRTCHNL_OP_GET_VF_RESOURCES: + /* for init adminq commands, need to poll the response */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + ret = i40evf_read_pfmsg(dev, &info); + if (ret == I40EVF_MSG_CMD) { + err = 0; + break; + } else if (ret == I40EVF_MSG_ERR) { + err = -1; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + _clear_cmd(vf); + break; + + default: + /* for other adminq in running time, waiting the cmd done flag */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) { + err = 0; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + break; + } return (err | vf->cmd_retval); } @@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev) map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer; map_info->num_vectors = 1; - map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR; + map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT; map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id; /* Alway use default dynamic MSIX interrupt */ map_info->vecmap[0].vector_id = vector_id; @@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev *dev, return 0; } +/* Disable IRQ0 */ +static inline void +i40evf_disable_irq0(struct i40e_hw *hw) +{ + /* Disable all interrupt types */ + I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0); + I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01, + I40E_VFINT_DYN_CTL01_ITR_I
[dpdk-dev] [PATCH v2 1/2] i40evf: allocate virtchnl cmd buffer for each vf
Currently, i40evf PMD uses a global static buffer to send virtchnl command to host driver. It is shared by multi VFs. This patch changed to allocate virtchnl cmd buffer for each VF. Signed-off-by: Jingjing Wu --- drivers/net/i40e/i40e_ethdev.h| 2 + drivers/net/i40e/i40e_ethdev_vf.c | 181 +++--- 2 files changed, 74 insertions(+), 109 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..93122ad 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -494,7 +494,9 @@ struct i40e_vf { bool link_up; bool vf_reset; volatile uint32_t pend_cmd; /* pending command not finished yet */ + uint32_t cmd_retval; /* return value of the cmd response from PF */ u16 pend_msg; /* flags indicates events from pf not handled yet */ + uint8_t *aq_resp; /* buffer to store the adminq response from PF */ /* VSI info */ struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */ diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 14d2a50..64e6957 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -103,9 +103,6 @@ enum i40evf_aq_result { I40EVF_MSG_CMD, /* Read async command result */ }; -/* A share buffer to store the command result from PF driver */ -static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ]; - static int i40evf_dev_configure(struct rte_eth_dev *dev); static int i40evf_dev_start(struct rte_eth_dev *dev); static void i40evf_dev_stop(struct rte_eth_dev *dev); @@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw) } /* - * Parse admin queue message. - * - * return value: - * < 0: meet error - * 0: read sys msg - * > 0: read cmd result + * Read data in admin queue to get msg from pf driver */ static enum i40evf_aq_result -i40evf_parse_pfmsg(struct i40e_vf *vf, - struct i40e_arq_event_info *event, - struct i40evf_arq_msg_info *data) +i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) { - enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\ - rte_le_to_cpu_32(event->desc.cookie_high); - enum i40e_status_code retval = (enum i40e_status_code)\ - rte_le_to_cpu_32(event->desc.cookie_low); - enum i40evf_aq_result ret = I40EVF_MSG_CMD; + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); + struct i40e_arq_event_info event; + enum i40e_virtchnl_ops opcode; + enum i40e_status_code retval; + int ret; + enum i40evf_aq_result result = I40EVF_MSG_NON; + event.buf_len = data->buf_len; + event.msg_buf = data->msg; + ret = i40e_clean_arq_element(hw, &event, NULL); + /* Can't read any msg from adminQ */ + if (ret) { + if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK) + result = I40EVF_MSG_NON; + else + result = I40EVF_MSG_ERR; + return result; + } + + opcode = (enum i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high); + retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low); /* pf sys event */ if (opcode == I40E_VIRTCHNL_OP_EVENT) { struct i40e_virtchnl_pf_event *vpe = - (struct i40e_virtchnl_pf_event *)event->msg_buf; + (struct i40e_virtchnl_pf_event *)event.msg_buf; - /* Initialize ret to sys event */ - ret = I40EVF_MSG_SYS; + result = I40EVF_MSG_SYS; switch (vpe->event) { case I40E_VIRTCHNL_EVENT_LINK_CHANGE: vf->link_up = @@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf, } } else { /* async reply msg on command issued by vf previously */ - ret = I40EVF_MSG_CMD; + result = I40EVF_MSG_CMD; /* Actual data length read from PF */ - data->msg_len = event->msg_len; + data->msg_len = event.msg_len; } - /* fill the ops and result to notify VF */ + data->result = retval; data->ops = opcode; - return ret; -} - -/* - * Read data in admin queue to get msg from pf driver - */ -static enum i40evf_aq_result -i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) -{ - struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); - struct i40e_arq_event_info event; - int ret; - enum i40evf_aq_result result = I40EVF_MSG_NON; - - event.buf_len = data->buf_len; - event.msg_buf = data->msg; - ret =
[dpdk-dev] [PATCH v2 0/2] i40evf: support interrupt based pf reset request
v2 changes: remove the change on vf reset status checking add pf event report support in release note If DPDK is used on VF while the host is using Linux Kernel driver as PF driver on FVL NIC, some setting on PF will trigger VF reset. DPDK VF need to know the event. This patch set makes the interrupt based request of PF reset from PF supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Jingjing Wu (2): i40evf: allocate virtchnl cmd buffer for each vf i40evf: support interrupt based pf reset request doc/guides/rel_notes/release_2_3.rst | 1 + drivers/net/i40e/i40e_ethdev.h | 2 + drivers/net/i40e/i40e_ethdev_vf.c| 423 +-- lib/librte_ether/rte_ethdev.h| 1 + 4 files changed, 304 insertions(+), 123 deletions(-) -- 2.4.0
[dpdk-dev] [PATCH v2 1/2] i40evf: allocate virtchnl cmd buffer for each vf
Currently, i40evf PMD uses a global static buffer to send virtchnl command to host driver. It is shared by multi VFs. This patch changed to allocate virtchnl cmd buffer for each VF. Signed-off-by: Jingjing Wu --- drivers/net/i40e/i40e_ethdev.h| 2 + drivers/net/i40e/i40e_ethdev_vf.c | 181 +++--- 2 files changed, 74 insertions(+), 109 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..93122ad 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -494,7 +494,9 @@ struct i40e_vf { bool link_up; bool vf_reset; volatile uint32_t pend_cmd; /* pending command not finished yet */ + uint32_t cmd_retval; /* return value of the cmd response from PF */ u16 pend_msg; /* flags indicates events from pf not handled yet */ + uint8_t *aq_resp; /* buffer to store the adminq response from PF */ /* VSI info */ struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */ diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 14d2a50..64e6957 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -103,9 +103,6 @@ enum i40evf_aq_result { I40EVF_MSG_CMD, /* Read async command result */ }; -/* A share buffer to store the command result from PF driver */ -static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ]; - static int i40evf_dev_configure(struct rte_eth_dev *dev); static int i40evf_dev_start(struct rte_eth_dev *dev); static void i40evf_dev_stop(struct rte_eth_dev *dev); @@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw) } /* - * Parse admin queue message. - * - * return value: - * < 0: meet error - * 0: read sys msg - * > 0: read cmd result + * Read data in admin queue to get msg from pf driver */ static enum i40evf_aq_result -i40evf_parse_pfmsg(struct i40e_vf *vf, - struct i40e_arq_event_info *event, - struct i40evf_arq_msg_info *data) +i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) { - enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\ - rte_le_to_cpu_32(event->desc.cookie_high); - enum i40e_status_code retval = (enum i40e_status_code)\ - rte_le_to_cpu_32(event->desc.cookie_low); - enum i40evf_aq_result ret = I40EVF_MSG_CMD; + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); + struct i40e_arq_event_info event; + enum i40e_virtchnl_ops opcode; + enum i40e_status_code retval; + int ret; + enum i40evf_aq_result result = I40EVF_MSG_NON; + event.buf_len = data->buf_len; + event.msg_buf = data->msg; + ret = i40e_clean_arq_element(hw, &event, NULL); + /* Can't read any msg from adminQ */ + if (ret) { + if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK) + result = I40EVF_MSG_NON; + else + result = I40EVF_MSG_ERR; + return result; + } + + opcode = (enum i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high); + retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low); /* pf sys event */ if (opcode == I40E_VIRTCHNL_OP_EVENT) { struct i40e_virtchnl_pf_event *vpe = - (struct i40e_virtchnl_pf_event *)event->msg_buf; + (struct i40e_virtchnl_pf_event *)event.msg_buf; - /* Initialize ret to sys event */ - ret = I40EVF_MSG_SYS; + result = I40EVF_MSG_SYS; switch (vpe->event) { case I40E_VIRTCHNL_EVENT_LINK_CHANGE: vf->link_up = @@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf, } } else { /* async reply msg on command issued by vf previously */ - ret = I40EVF_MSG_CMD; + result = I40EVF_MSG_CMD; /* Actual data length read from PF */ - data->msg_len = event->msg_len; + data->msg_len = event.msg_len; } - /* fill the ops and result to notify VF */ + data->result = retval; data->ops = opcode; - return ret; -} - -/* - * Read data in admin queue to get msg from pf driver - */ -static enum i40evf_aq_result -i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) -{ - struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); - struct i40e_arq_event_info event; - int ret; - enum i40evf_aq_result result = I40EVF_MSG_NON; - - event.buf_len = data->buf_len; - event.msg_buf = data->msg; - ret =
[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request
Interrupt based request of PF reset from PF is supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Signed-off-by: Jingjing Wu --- doc/guides/rel_notes/release_2_3.rst | 1 + drivers/net/i40e/i40e_ethdev_vf.c| 274 +++ lib/librte_ether/rte_ethdev.h| 1 + 3 files changed, 246 insertions(+), 30 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..73d5f76 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,7 @@ DPDK Release 2.3 New Features +* **Added pf reset event reported in i40e vf PMD driver. Resolved Issues --- diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 64e6957..1ffe64e 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -74,8 +74,6 @@ #define I40EVF_BUSY_WAIT_DELAY 10 #define I40EVF_BUSY_WAIT_COUNT 50 #define MAX_RESET_WAIT_CNT 20 -/*ITR index for NOITR*/ -#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3 struct i40evf_arq_msg_info { enum i40e_virtchnl_ops ops; @@ -151,6 +149,9 @@ static int i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev, + uint8_t *msg, + uint16_t msglen); /* Default hash key buffer for RSS */ static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1]; @@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct vf_cmd_info *args) return err; } - do { - /* Delay some time first */ - rte_delay_ms(ASQ_DELAY_MS); - ret = i40evf_read_pfmsg(dev, &info); - if (ret == I40EVF_MSG_CMD) { - err = 0; - break; - } else if (ret == I40EVF_MSG_ERR) { - err = -1; - break; - } - /* If don't read msg or read sys event, continue */ - } while (i++ < MAX_TRY_TIMES); - _clear_cmd(vf); + switch (args->ops) { + case I40E_VIRTCHNL_OP_RESET_VF: + /*no need to process in this function */ + break; + case I40E_VIRTCHNL_OP_VERSION: + case I40E_VIRTCHNL_OP_GET_VF_RESOURCES: + /* for init adminq commands, need to poll the response */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + ret = i40evf_read_pfmsg(dev, &info); + if (ret == I40EVF_MSG_CMD) { + err = 0; + break; + } else if (ret == I40EVF_MSG_ERR) { + err = -1; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + _clear_cmd(vf); + break; + + default: + /* for other adminq in running time, waiting the cmd done flag */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) { + err = 0; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + break; + } return (err | vf->cmd_retval); } @@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev) map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer; map_info->num_vectors = 1; - map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR; + map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT; map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id; /* Alway use default dynamic MSIX interrupt */ map_info->vecmap[0].vector_id = vector_id; @@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev *dev, return 0; } +/* Disable IRQ0 */ +static inline void +i40evf_disable_irq0(struct i40e_hw *hw) +{ + /* Disable all interrupt types */ + I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0); + I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01, + I40E_VFINT_DYN_CTL01_ITR_I
[dpdk-dev] [PATCH v2 0/2] i40evf: support interrupt based pf reset request
v2 changes: remove the change on vf reset status checking add pf event report support in release note If DPDK is used on VF while the host is using Linux Kernel driver as PF driver on FVL NIC, some setting on PF will trigger VF reset. DPDK VF need to know the event. This patch set makes the interrupt based request of PF reset from PF supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Jingjing Wu (2): i40evf: allocate virtchnl cmd buffer for each vf i40evf: support interrupt based pf reset request doc/guides/rel_notes/release_2_3.rst | 1 + drivers/net/i40e/i40e_ethdev.h | 2 + drivers/net/i40e/i40e_ethdev_vf.c| 423 +-- lib/librte_ether/rte_ethdev.h| 1 + 4 files changed, 304 insertions(+), 123 deletions(-) -- 2.4.0
[dpdk-dev] [PATCH 15/16] fm10k: use default mailbox message handler for pf
> -Original Message- > From: Richardson, Bruce > Sent: Wednesday, January 27, 2016 4:17 AM > To: Wang, Xiao W > Cc: Chen, Jing D ; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 15/16] fm10k: use default mailbox message > handler for pf > > On Mon, Jan 25, 2016 at 02:31:05AM +, Wang, Xiao W wrote: > > Hi Bruce, > > > > > -Original Message- > > > From: Richardson, Bruce > > > Sent: Saturday, January 23, 2016 5:32 AM > > > To: Wang, Xiao W > > > Cc: Chen, Jing D ; dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH 15/16] fm10k: use default mailbox > > > message handler for pf > > > > > > On Thu, Jan 21, 2016 at 06:36:00PM +0800, Wang Xiao W wrote: > > > > The new share code makes fm10k_msg_update_pvid_pf function static, > > > > so we can not refer to it now in fm10k_ethdev.c. The registered pf > > > > handler is almost the same as the default pf handler, removing it > > > > has no > > > impact on mailbox. > > > > > > > > Signed-off-by: Wang Xiao W > > > > > > What patch makes the function static, as we need to ensure that the > > > build is not broken by having this patch in the wrong place in the > > > patchset? > > > > > > Also, it seems strange having this patch in the middle of a series > > > of base code updates - perhaps it should go first, so that all base > > > code update patches can go one after the other. > > > > > > /Bruce > > > > It's the first patch in the patch set that makes the function static. > > So does this patch not need to go before patch 1, if we can't refer to the > function once patch one is applied? > > /Bruce OK, got it, I will revise my patch, thanks a lot for your comment. Best Regards, Wang, Xiao
[dpdk-dev] [PATCH v5 01/11] virtio: Introduce config RTE_VIRTIO_INC_VECTOR
Ping? On Jan 19, 2016 5:16 PM, "Santosh Shukla" wrote: > - virtio_recv_pkts_vec and other virtio vector friend apis are written for > sse/avx instructions. For arm64 in particular, virtio vector > implementation > does not exist(todo). > > So virtio pmd driver wont build for targets like i686, arm64. By making > RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will > work > in non-vectored virtio mode. > > Disabling RTE_VIRTIO_INC_VECTOR config for : > > - i686 arch as i686 target config says: > config/defconfig_i686-native-linuxapp-gcc says "Vectorized PMD is not > supported on 32-bit". > > - armv7/v8 arch. > > Signed-off-by: Santosh Shukla > --- > v4--> v5: > - squashed v4's RTE_VIRTIO_INC_VECTOR patches into one patch. > - Added ifdefs RTE_xx_xx_INC_VECTOR across _simple_rx_tx flag occurance in > code. > > > config/common_linuxapp |1 + > config/defconfig_arm-armv7a-linuxapp-gcc |4 +++- > config/defconfig_arm64-armv8a-linuxapp-gcc |4 +++- > config/defconfig_i686-native-linuxapp-gcc |1 + > config/defconfig_i686-native-linuxapp-icc |1 + > drivers/net/virtio/Makefile|2 +- > drivers/net/virtio/virtio_rxtx.c | 16 +++- > drivers/net/virtio/virtio_rxtx.h |2 ++ > 8 files changed, 27 insertions(+), 4 deletions(-) > > diff --git a/config/common_linuxapp b/config/common_linuxapp > index 74bc515..8677697 100644 > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > @@ -274,6 +274,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n > CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n > CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER=n > CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n > +CONFIG_RTE_VIRTIO_INC_VECTOR=y > > # > # Compile burst-oriented VMXNET3 PMD driver > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc > b/config/defconfig_arm-armv7a-linuxapp-gcc > index cbebd64..9f852ce 100644 > --- a/config/defconfig_arm-armv7a-linuxapp-gcc > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > @@ -43,6 +43,9 @@ CONFIG_RTE_FORCE_INTRINSICS=y > CONFIG_RTE_TOOLCHAIN="gcc" > CONFIG_RTE_TOOLCHAIN_GCC=y > > +# Disable VIRTIO VECTOR support > +CONFIG_RTE_VIRTIO_INC_VECTOR=n > + > # ARM doesn't have support for vmware TSC map > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n > > @@ -70,7 +73,6 @@ CONFIG_RTE_LIBRTE_I40E_PMD=n > CONFIG_RTE_LIBRTE_IXGBE_PMD=n > CONFIG_RTE_LIBRTE_MLX4_PMD=n > CONFIG_RTE_LIBRTE_MPIPE_PMD=n > -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n > CONFIG_RTE_LIBRTE_VMXNET3_PMD=n > CONFIG_RTE_LIBRTE_PMD_XENVIRT=n > CONFIG_RTE_LIBRTE_PMD_BNX2X=n > diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc > b/config/defconfig_arm64-armv8a-linuxapp-gcc > index 504f3ed..1a638b3 100644 > --- a/config/defconfig_arm64-armv8a-linuxapp-gcc > +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc > @@ -45,8 +45,10 @@ CONFIG_RTE_TOOLCHAIN_GCC=y > > CONFIG_RTE_CACHE_LINE_SIZE=64 > > +# Disable VIRTIO VECTOR support > +CONFIG_RTE_VIRTIO_INC_VECTOR=n > + > CONFIG_RTE_IXGBE_INC_VECTOR=n > -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n > CONFIG_RTE_LIBRTE_IVSHMEM=n > CONFIG_RTE_LIBRTE_FM10K_PMD=n > CONFIG_RTE_LIBRTE_I40E_PMD=n > diff --git a/config/defconfig_i686-native-linuxapp-gcc > b/config/defconfig_i686-native-linuxapp-gcc > index a90de9b..a4b1c49 100644 > --- a/config/defconfig_i686-native-linuxapp-gcc > +++ b/config/defconfig_i686-native-linuxapp-gcc > @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n > # Vectorized PMD is not supported on 32-bit > # > CONFIG_RTE_IXGBE_INC_VECTOR=n > +CONFIG_RTE_VIRTIO_INC_VECTOR=n > diff --git a/config/defconfig_i686-native-linuxapp-icc > b/config/defconfig_i686-native-linuxapp-icc > index c021321..f8eb6ad 100644 > --- a/config/defconfig_i686-native-linuxapp-icc > +++ b/config/defconfig_i686-native-linuxapp-icc > @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n > # Vectorized PMD is not supported on 32-bit > # > CONFIG_RTE_IXGBE_INC_VECTOR=n > +CONFIG_RTE_VIRTIO_INC_VECTOR=n > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile > index 43835ba..25a842d 100644 > --- a/drivers/net/virtio/Makefile > +++ b/drivers/net/virtio/Makefile > @@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c > SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c > SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c > SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c > -SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c > +SRCS-$(CONFIG_RTE_VIRTIO_INC_VECTOR) += virtio_rxtx_simple.c > > # this lib depends upon: > DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether > diff --git a/drivers/net/virtio/virtio_rxtx.c > b/drivers/net/virtio/virtio_rxtx.c > index 41a1366..d8169d1 100644 > --- a/drivers/net/virtio/virtio_rxtx.c > +++ b/drivers/net/virtio/virtio_rxtx.c > @@ -67,7 +67,9 @@ > #define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \ > ETH_TXQ_FLAGS_NOOFFLOADS) > > +#ifdef RTE_VIRTIO_INC_VECTOR > static int use_simple_rx
[dpdk-dev] [PATCH v5 03/11] linuxapp/vfio: ignore mapping for ioport region
Ping. On Jan 19, 2016 5:16 PM, "Santosh Shukla" wrote: > vfio_pci_mmap() try to map all pci bars. ioport region are not mapped in > vfio/kernel so ignore mmaping for ioport. > > Signed-off-by: Santosh Shukla > --- > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > index 74f91ba..abde779 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > @@ -573,6 +573,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev) > struct pci_map *maps; > uint32_t msix_table_offset = 0; > uint32_t msix_table_size = 0; > + uint32_t ioport_bar; > > dev->intr_handle.fd = -1; > dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; > @@ -760,6 +761,25 @@ pci_vfio_map_resource(struct rte_pci_device *dev) > return -1; > } > > + /* chk for io port region */ > + ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar), > + > VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) > + + PCI_BASE_ADDRESS_0 + i*4); > + > + if (ret != sizeof(ioport_bar)) { > + RTE_LOG(ERR, EAL, > + "Cannot read command (%x) from config > space!\n", > + PCI_BASE_ADDRESS_0 + i*4); > + return -1; > + } > + > + if (ioport_bar & PCI_BASE_ADDRESS_SPACE_IO) { > + RTE_LOG(INFO, EAL, > + "Ignore mapping IO port bar(%d) addr: > %x\n", > +i, ioport_bar); > + continue; > + } > + > /* skip non-mmapable BARs */ > if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0) > continue; > -- > 1.7.9.5 > >
[dpdk-dev] [PATCH v5 04/11] virtio_pci.h: build fix for sys/io.h for non-x86 arch
Ping On Jan 19, 2016 5:16 PM, "Santosh Shukla" wrote: > make sure sys/io.h used only for x86 archs. This fixes build error > arm64/arm case. > > Signed-off-by: Santosh Shukla > --- > drivers/net/virtio/virtio_pci.h |2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/virtio/virtio_pci.h > b/drivers/net/virtio/virtio_pci.h > index 99572a0..f550d22 100644 > --- a/drivers/net/virtio/virtio_pci.h > +++ b/drivers/net/virtio/virtio_pci.h > @@ -40,8 +40,10 @@ > #include > #include > #else > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) > #include > #endif > +#endif > > #include > > -- > 1.7.9.5 > >
[dpdk-dev] [PATCH v5 01/11] virtio: Introduce config RTE_VIRTIO_INC_VECTOR
On Wed, Jan 27, 2016 at 07:53:21AM +0530, Santosh Shukla wrote: > Ping? I was on vacation late last week. And I was quite busy till now after the vacation. So, sorry that I still don't have time to do more detailed reviews in 1 or 2 days. Hopefully I can make it by this Friday. BTW, I had a very glimpse of this patchset, overall, it looks much better now, except the EAL changes (I'm not the maintainer) and the virtio io port read/write stuff: Tetsuay suggested to add another access wraps, but I have few concerns about that. Anyway, I don't have time for deeper thoughts, and I will re-think it later. --yliu
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > + } else { > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > + } Thanks! We might simplify this further. Just reset the first two fields flags and gso_type.
[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio
On 1/11/2016 2:43 AM, Tan, Jianfeng wrote: > Add a new virtual device named eth_cvio, it can be used just like > eth_ring, eth_null, etc. > > Configured parameters include: > - rx (optional, 1 by default): number of rx, only allowed to be > 1 for now. > - tx (optional, 1 by default): number of tx, only allowed to be > 1 for now. >From APP side, virtio is something HW, in your implementation rx/tx is max queue numbers virtio supported. Does it make sense? Why need user tell HW, how much queues it support? We'd better make it un-configurable, only let users query it like the real HW, and then decide how much queues it need to enable. > - cq (optional, 0 by default): if ctrl queue is enabled, not > supported for now. > - mac (optional): mac address, random value will be given if not > specified. > - queue_num (optional, 256 by default): size of virtqueue. Better change it to queue_size. Thanks, Michael > - path (madatory): path of vhost, depends on the file type: > vhost-user is used if the given path points to > a unix socket; vhost-net is used if the given > path points to a char device. > > The major difference with original virtio for vm is that, here we > use virtual address instead of physical address for vhost to > calculate relative address. > > When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled > library can be used in both VM and container environment. > > Examples: > a. Use vhost-net as a backend > sudo numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \ > -m 1024 --no-pci --single-file --file-prefix=l2fwd \ > --vdev=eth_cvio0,mac=00:01:02:03:04:05,path=/dev/vhost-net \ > -- -p 0x1 > > b. Use vhost-user as a backend > numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 -m 1024 \ > --no-pci --single-file --file-prefix=l2fwd \ > --vdev=eth_cvio0,mac=00:01:02:03:04:05,path= \ > -- -p 0x1 > > Signed-off-by: Huawei Xie > Signed-off-by: Jianfeng Tan > --- >
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote: > On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > > + } else { > > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > > + } > > Thanks! > We might simplify this further. Just reset the first two fields flags > and gso_type. What's this "simplification" for? Don't even to say that we will add TSO support, which modifies few more files, such as csum_start: reseting the first two fields only is wrong here. --yliu
[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst
On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote: > On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > > Signed-off-by: Yuanhan Liu > > --- > > lib/librte_vhost/vhost_rxtx.c | 287 > > +- > > 1 file changed, 113 insertions(+), 174 deletions(-) > > Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not to do that, unless it has a good reason. > processing descriptors in a burst way in future. So, do you have a plan? --yliu
[dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx
On Thu, Jan 21, 2016 at 02:50:01PM +0100, J?r?me Jutteau wrote: > Hi Yuanhan, > > 2015-12-14 2:47 GMT+01:00 Yuanhan Liu : > > Right, I should move it in the beginning of this function. > > Any news about this refactoring ? Hi J?r?me, Thanks for showing interests in this patch set; I was waiting for Huawei's comments. And fortunately, he starts making comments. --yliu
[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support
On Thu, Jan 21, 2016 at 12:49:10PM +0100, Thomas Monjalon wrote: > 2016-01-19 16:12, Yuanhan Liu: > > int > > vtpci_init(struct rte_pci_device *dev, struct virtio_hw *hw) > > { > > - hw->vtpci_ops = &legacy_ops; > > + hw->dev = dev; > > + > > + /* > > +* Try if we can succeed reading virtio pci caps, which exists > > +* only on modern pci device. If failed, we fallback to legacy > > +* virtio handling. > > +*/ > > + if (virtio_read_caps(dev, hw) == 0) { > > + PMD_INIT_LOG(INFO, "modern virtio pci detected."); > > + hw->vtpci_ops = &modern_ops; > > + hw->modern= 1; > > + dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC; > > + return 0; > > + } > > RTE_PCI_DRV_INTR_LSC is already set by virtio_resource_init_by_uio(). We don't go that far here. Here we just detect if it's a modern virtio device. And if yes, we do some modern initiations, and return. virtio_resource_init_by_uio() is invoked when virtio_read_caps() fails. > Do you mean interrupt was not supported with legacy virtio? Nope. this patch set changes nothing on legacy virtio support. --yliu
[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support
On Thu, Jan 21, 2016 at 12:37:42PM +0100, Thomas Monjalon wrote: > 2016-01-19 16:12, Yuanhan Liu: > > +#define IO_READ_DEF(nr_bits, type) \ > > +static inline type \ > > +io_read##nr_bits(type *addr) \ > > +{ \ > > + return *(volatile type *)addr; \ > > +} > > + > > +#define IO_WRITE_DEF(nr_bits, type)\ > > +static inline void \ > > +io_write##nr_bits(type val, type *addr)\ > > +{ \ > > + *(volatile type *)addr = val; \ > > +} > > + > > +IO_READ_DEF (8, uint8_t) > > +IO_WRITE_DEF(8, uint8_t) > > + > > +IO_READ_DEF (16, uint16_t) > > +IO_WRITE_DEF(16, uint16_t) > > + > > +IO_READ_DEF (32, uint32_t) > > +IO_WRITE_DEF(32, uint32_t) > > Yes you can do this. > But not sure you should. > > > +static inline void > > +io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi) > > +{ > > + io_write32(val & ((1ULL << 32) - 1), lo); > > + io_write32(val >> 32,hi); > > +} > > When debugging this code, how GDB behave? > How to find the definition of io_write32() with grep or simple editors? Okay, I will unfold them. --yliu
[dpdk-dev] [PATCH v2 00/16] fm10k: update shared code
v2: * Put the two extra fix patches ahead of the base code patches. Wang Xiao W (16): fm10k: use default mailbox message handler for pf fm10k/base: add macro definitions that are needed fm10k/base: cleanup namespace pollution and correct typecast fm10k/base: use bitshift for itr_scale fm10k/base: reset max_queues on init_hw_vf failure fm10k/base: document ITR scale workaround in VF TDLEN register fm10k/base: fix checkpatch warning fm10k/base: use BIT macro instead of open-coded bit-shifting of 1 fm10k/base: do not use CamelCase fm10k/base: use memcpy for mac addr copy fm10k/base: allow removal of is_slot_appropriate function fm10k/base: consistently use VLAN ID when referencing vid variables fm10k/base: fix comment per upstream review changes fm10k/base: TLV structures must be 4byte aligned, not 1byte aligned fm10k/base: move constants to the right of binary operators fm10k/base: minor cleanups drivers/net/fm10k/base/fm10k_api.c | 2 + drivers/net/fm10k/base/fm10k_api.h | 2 + drivers/net/fm10k/base/fm10k_mbx.c | 63 +++- drivers/net/fm10k/base/fm10k_mbx.h | 11 +-- drivers/net/fm10k/base/fm10k_osdep.h | 30 ++ drivers/net/fm10k/base/fm10k_pf.c| 88 + drivers/net/fm10k/base/fm10k_pf.h| 18 ++-- drivers/net/fm10k/base/fm10k_tlv.c | 40 drivers/net/fm10k/base/fm10k_tlv.h | 9 +- drivers/net/fm10k/base/fm10k_type.h | 182 +++ drivers/net/fm10k/base/fm10k_vf.c| 32 -- drivers/net/fm10k/fm10k_ethdev.c | 41 +++- 12 files changed, 220 insertions(+), 298 deletions(-) -- 1.9.3
[dpdk-dev] [PATCH v2 01/16] fm10k: use default mailbox message handler for pf
The new share code makes fm10k_msg_update_pvid_pf function static, so we can not refer to it now in fm10k_ethdev.c. The registered pf handler is almost the same as the default pf handler, removing it has no impact on mailbox. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/fm10k_ethdev.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index e4aed94..2c38ce9 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2367,29 +2367,16 @@ static const struct fm10k_msg_data fm10k_msgdata_vf[] = { FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error), }; -/* Mailbox message handler in PF */ -static const struct fm10k_msg_data fm10k_msgdata_pf[] = { - FM10K_PF_MSG_ERR_HANDLER(XCAST_MODES, fm10k_msg_err_pf), - FM10K_PF_MSG_ERR_HANDLER(UPDATE_MAC_FWD_RULE, fm10k_msg_err_pf), - FM10K_PF_MSG_LPORT_MAP_HANDLER(fm10k_msg_lport_map_pf), - FM10K_PF_MSG_ERR_HANDLER(LPORT_CREATE, fm10k_msg_err_pf), - FM10K_PF_MSG_ERR_HANDLER(LPORT_DELETE, fm10k_msg_err_pf), - FM10K_PF_MSG_UPDATE_PVID_HANDLER(fm10k_msg_update_pvid_pf), - FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error), -}; - static int fm10k_setup_mbx_service(struct fm10k_hw *hw) { - int err; + int err = 0; /* Initialize mailbox lock */ fm10k_mbx_initlock(hw); /* Replace default message handler with new ones */ - if (hw->mac.type == fm10k_mac_pf) - err = hw->mbx.ops.register_handlers(&hw->mbx, fm10k_msgdata_pf); - else + if (hw->mac.type == fm10k_mac_vf) err = hw->mbx.ops.register_handlers(&hw->mbx, fm10k_msgdata_vf); if (err) { -- 1.9.3
[dpdk-dev] [PATCH v2 02/16] fm10k/base: add macro definitions that are needed
Some macros such as FM10K_RXINT_TIMER_SHIFT are removed in the share code drop, but they are needed in dpdk/fm10k. This patch put all these necessary macros into fm10k_osdep.h Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_osdep.h | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/net/fm10k/base/fm10k_osdep.h b/drivers/net/fm10k/base/fm10k_osdep.h index 6852ef0..869af1b 100644 --- a/drivers/net/fm10k/base/fm10k_osdep.h +++ b/drivers/net/fm10k/base/fm10k_osdep.h @@ -150,6 +150,36 @@ typedef intbool; #define fm10k_read_reg FM10K_READ_REG #endif +#define FM10K_INTEL_VENDOR_ID 0x8086 +#define FM10K_DMA_CTRL_MINMSS_SHIFT9 +#define FM10K_EICR_PCA_FAULT 0x0001 +#define FM10K_EICR_THI_FAULT 0x0004 +#define FM10K_EICR_FUM_FAULT 0x0020 +#define FM10K_EICR_SRAMERROR 0x0400 +#define FM10K_SRAM_IP 0x13003 +#define FM10K_RXINT_TIMER_SHIFT8 +#define FM10K_TXINT_TIMER_SHIFT8 +#define FM10K_RXD_PKTTYPE_MASK 0x03F0 +#define FM10K_RXD_PKTTYPE_SHIFT4 +enum fm10k_rdesc_pkt_type { + /* L3 type */ + FM10K_PKTTYPE_OTHER = 0x00, + FM10K_PKTTYPE_IPV4 = 0x01, + FM10K_PKTTYPE_IPV4_EX = 0x02, + FM10K_PKTTYPE_IPV6 = 0x03, + FM10K_PKTTYPE_IPV6_EX = 0x04, + + /* L4 type */ + FM10K_PKTTYPE_TCP = 0x08, + FM10K_PKTTYPE_UDP = 0x10, + FM10K_PKTTYPE_GRE = 0x18, + FM10K_PKTTYPE_VXLAN = 0x20, + FM10K_PKTTYPE_NVGRE = 0x28, + FM10K_PKTTYPE_GENEVE= 0x30 +}; +#define FM10K_RXD_STATUS_IPCS 0x0008 /* Indicates IPv4 csum */ +#define FM10K_RXD_STATUS_HBO 0x0400 /* header buffer overrun */ + #define FM10K_TSO_MINMSS \ (FM10K_DMA_CTRL_MINMSS_64 >> FM10K_DMA_CTRL_MINMSS_SHIFT) #define FM10K_TSO_MIN_HEADERLEN54 -- 1.9.3
[dpdk-dev] [PATCH v2 03/16] fm10k/base: cleanup namespace pollution and correct typecast
Correct typecast in fm10k_update_xc_addr_pf. Make functions that are only referenced locally static. And fix the function header comment for fm10k_tlv_attr_nest_stop() while we're at it. Wrap fm10k_msg_data fm10k_iov_msg_data_pf[] in the new ifndef NO_DEFAULT_SRIOV_MSG_HANDLERS so that drivers with custom SR-IOV message handlers can strip it. remove unused struct element in struct fm10k_mac_ops. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 10 ++ drivers/net/fm10k/base/fm10k_pf.h | 4 ++-- drivers/net/fm10k/base/fm10k_tlv.c | 16 drivers/net/fm10k/base/fm10k_tlv.h | 5 - drivers/net/fm10k/base/fm10k_type.h | 1 - drivers/net/fm10k/base/fm10k_vf.c | 2 -- 6 files changed, 16 insertions(+), 22 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 6e6d71e..5b8c039 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -379,8 +379,8 @@ STATIC s32 fm10k_update_xc_addr_pf(struct fm10k_hw *hw, u16 glort, ((u32)mac[3] << 16) | ((u32)mac[4] << 8) | ((u32)mac[5])); - mac_update.mac_upper = FM10K_CPU_TO_LE16(((u32)mac[0] << 8) | -((u32)mac[1])); + mac_update.mac_upper = FM10K_CPU_TO_LE16(((u16)mac[0] << 8) | + ((u16)mac[1])); mac_update.vlan = FM10K_CPU_TO_LE16(vid); mac_update.glort = FM10K_CPU_TO_LE16(glort); mac_update.action = add ? 0 : 1; @@ -1457,6 +1457,7 @@ s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *hw, u32 **results, return err; } +#ifndef NO_DEFAULT_SRIOV_MSG_HANDLERS const struct fm10k_msg_data fm10k_iov_msg_data_pf[] = { FM10K_TLV_MSG_TEST_HANDLER(fm10k_tlv_msg_test), FM10K_VF_MSG_MSIX_HANDLER(fm10k_iov_msg_msix_pf), @@ -1465,6 +1466,7 @@ const struct fm10k_msg_data fm10k_iov_msg_data_pf[] = { FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error), }; +#endif /** * fm10k_update_stats_hw_pf - Updates hardware related statistics of PF * @hw: pointer to hardware structure @@ -1754,8 +1756,8 @@ const struct fm10k_tlv_attr fm10k_update_pvid_msg_attr[] = { * * This handler configures the default VLAN for the PF **/ -s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, u32 **results, -struct fm10k_mbx_info *mbx) +static s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, u32 **results, + struct fm10k_mbx_info *mbx) { u16 glort, pvid; u32 pvid_update; diff --git a/drivers/net/fm10k/base/fm10k_pf.h b/drivers/net/fm10k/base/fm10k_pf.h index 44bd193..92e2962 100644 --- a/drivers/net/fm10k/base/fm10k_pf.h +++ b/drivers/net/fm10k/base/fm10k_pf.h @@ -149,8 +149,6 @@ extern const struct fm10k_tlv_attr fm10k_lport_map_msg_attr[]; #define FM10K_PF_MSG_LPORT_MAP_HANDLER(func) \ FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_MAP, \ fm10k_lport_map_msg_attr, func) -s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *, u32 **, -struct fm10k_mbx_info *); extern const struct fm10k_tlv_attr fm10k_update_pvid_msg_attr[]; #define FM10K_PF_MSG_UPDATE_PVID_HANDLER(func) \ FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_UPDATE_PVID, \ @@ -183,7 +181,9 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *); s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *); +#ifndef NO_DEFAULT_SRIOV_MSG_HANDLERS extern const struct fm10k_msg_data fm10k_iov_msg_data_pf[]; +#endif s32 fm10k_init_ops_pf(struct fm10k_hw *hw); #endif /* _FM10K_PF_H */ diff --git a/drivers/net/fm10k/base/fm10k_tlv.c b/drivers/net/fm10k/base/fm10k_tlv.c index 1d9d7d8..ade87d1 100644 --- a/drivers/net/fm10k/base/fm10k_tlv.c +++ b/drivers/net/fm10k/base/fm10k_tlv.c @@ -63,8 +63,8 @@ s32 fm10k_tlv_msg_init(u32 *msg, u16 msg_id) * the attribute buffer. It will return success if provided with a valid * pointers. **/ -s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id, - const unsigned char *string) +static s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id, + const unsigned char *string) { u32 attr_data = 0, len = 0; u32 *attr; @@ -115,7 +115,7 @@ s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id, * it in the array pointed by by string. It will return success if provided * with a valid pointers. **/ -s32 fm10k_tlv_attr_get_null_string(u32 *attr, unsigned char *string) +static s32 fm10k_tlv_attr_get_null_string(u32 *attr, unsigned char *string) { u32 len; @@ -386,7 +386,7 @@ s32 fm10k_tlv_attr_get_le_struct(u32 *attr, void
[dpdk-dev] [PATCH v2 04/16] fm10k/base: use bitshift for itr_scale
Upstream community wishes us to use bitshift instead of a divisor, because this is faster, and prevents any need for a '0' check. In our case, this even works out because default Gen3 will be 0. Because of this, we are also able to remove the check for non-zero value in the vf code path since that will already be the default Gen3 case. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_type.h | 6 +++--- drivers/net/fm10k/base/fm10k_vf.c | 4 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_type.h b/drivers/net/fm10k/base/fm10k_type.h index 62fa73f..44187b1 100644 --- a/drivers/net/fm10k/base/fm10k_type.h +++ b/drivers/net/fm10k/base/fm10k_type.h @@ -352,9 +352,9 @@ struct fm10k_hw; #define FM10K_TDLEN(_n)((0x40 * (_n)) + 0x8002) #define FM10K_TDLEN_ITR_SCALE_SHIFT9 #define FM10K_TDLEN_ITR_SCALE_MASK 0x0E00 -#define FM10K_TDLEN_ITR_SCALE_GEN1 4 -#define FM10K_TDLEN_ITR_SCALE_GEN2 2 -#define FM10K_TDLEN_ITR_SCALE_GEN3 1 +#define FM10K_TDLEN_ITR_SCALE_GEN1 2 +#define FM10K_TDLEN_ITR_SCALE_GEN2 1 +#define FM10K_TDLEN_ITR_SCALE_GEN3 0 #define FM10K_TPH_TXCTRL(_n) ((0x40 * (_n)) + 0x8003) #define FM10K_TPH_TXCTRL_DESC_TPHEN0x0020 #define FM10K_TPH_TXCTRL_DESC_RROEN0x0200 diff --git a/drivers/net/fm10k/base/fm10k_vf.c b/drivers/net/fm10k/base/fm10k_vf.c index 7822ab6..39bc927 100644 --- a/drivers/net/fm10k/base/fm10k_vf.c +++ b/drivers/net/fm10k/base/fm10k_vf.c @@ -159,10 +159,6 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw) FM10K_TDLEN_ITR_SCALE_MASK) >> FM10K_TDLEN_ITR_SCALE_SHIFT; - /* ensure a non-zero itr scale */ - if (!hw->mac.itr_scale) - hw->mac.itr_scale = FM10K_TDLEN_ITR_SCALE_GEN3; - return FM10K_SUCCESS; } -- 1.9.3
[dpdk-dev] [PATCH v2 05/16] fm10k/base: reset max_queues on init_hw_vf failure
VF drivers must detect how many queues are available. Previously, the driver assumed that each VF has at minimum 1 queue. This assumption is incorrect, since it is possible that the PF has not yet assigned the queues to the VF by the time the VF checks. To resolve this, we added a check first to ensure that the first queue is infact owned by the VF at init_hw_vf time. However, the code flow did not reset hw->mac.max_queues to 0. In some cases, such as during reinit flows, we call init_hw_vf without clearing the previous value of hw->mac.max_queues. Due to this, when init_hw_vf errors out, if its error code is not properly handled the VF driver may still believe it has queues which no longer belong to it. Fix this by clearing the hw->mac.max_queues on exit due to errors. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_vf.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_vf.c b/drivers/net/fm10k/base/fm10k_vf.c index 39bc927..9b10ee4 100644 --- a/drivers/net/fm10k/base/fm10k_vf.c +++ b/drivers/net/fm10k/base/fm10k_vf.c @@ -128,8 +128,10 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw) /* verify we have at least 1 queue */ if (!~FM10K_READ_REG(hw, FM10K_TXQCTL(0)) || - !~FM10K_READ_REG(hw, FM10K_RXQCTL(0))) - return FM10K_ERR_NO_RESOURCES; + !~FM10K_READ_REG(hw, FM10K_RXQCTL(0))) { + err = FM10K_ERR_NO_RESOURCES; + goto reset_max_queues; + } /* determine how many queues we have */ for (i = 1; tqdloc0 && (i < FM10K_MAX_QUEUES_POOL); i++) { @@ -147,7 +149,7 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw) /* shut down queues we own and reset DMA configuration */ err = fm10k_disable_queues_generic(hw, i); if (err) - return err; + goto reset_max_queues; /* record maximum queue count */ hw->mac.max_queues = i; @@ -160,6 +162,11 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw) FM10K_TDLEN_ITR_SCALE_SHIFT; return FM10K_SUCCESS; + +reset_max_queues: + hw->mac.max_queues = 0; + + return err; } /** -- 1.9.3
[dpdk-dev] [PATCH v2 06/16] fm10k/base: document ITR scale workaround in VF TDLEN register
Add comments which properly explain the undocumented use of bits in TDLEN register prior to VF initializing it to the correct value. Note that the mechanism is entirely software-defined and explain its purpose to help reduce confusion in the future. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 6 +- drivers/net/fm10k/base/fm10k_type.h | 9 + drivers/net/fm10k/base/fm10k_vf.c | 9 + 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 5b8c039..6de679e 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -958,7 +958,8 @@ STATIC s32 fm10k_iov_assign_default_mac_vlan_pf(struct fm10k_hw *hw, FM10K_WRITE_REG(hw, FM10K_TDBAH(vf_q_idx), tdbah); /* Provide the VF the ITR scale, using software-defined fields in TDLEN -* to pass the information during VF initialization +* to pass the information during VF initialization. See definition of +* FM10K_TDLEN_ITR_SCALE_SHIFT for more details. */ FM10K_WRITE_REG(hw, FM10K_TDLEN(vf_q_idx), hw->mac.itr_scale << FM10K_TDLEN_ITR_SCALE_SHIFT); @@ -1095,6 +1096,9 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw *hw, for (i = queues_per_pool; i--;) { FM10K_WRITE_REG(hw, FM10K_TDBAL(vf_q_idx + i), tdbal); FM10K_WRITE_REG(hw, FM10K_TDBAH(vf_q_idx + i), tdbah); + /* See definition of FM10K_TDLEN_ITR_SCALE_SHIFT for an +* explanation of how TDLEN is used. +*/ FM10K_WRITE_REG(hw, FM10K_TDLEN(vf_q_idx + i), hw->mac.itr_scale << FM10K_TDLEN_ITR_SCALE_SHIFT); diff --git a/drivers/net/fm10k/base/fm10k_type.h b/drivers/net/fm10k/base/fm10k_type.h index 44187b1..5db6345 100644 --- a/drivers/net/fm10k/base/fm10k_type.h +++ b/drivers/net/fm10k/base/fm10k_type.h @@ -350,6 +350,15 @@ struct fm10k_hw; #define FM10K_TDBAL(_n)((0x40 * (_n)) + 0x8000) #define FM10K_TDBAH(_n)((0x40 * (_n)) + 0x8001) #define FM10K_TDLEN(_n)((0x40 * (_n)) + 0x8002) +/* When fist initialized, VFs need to know the Interrupt Throttle Rate (ITR) + * scale which is based on the PCIe speed but the speed information in the PCI + * configuration space may not be accurate. The PF already knows the ITR scale + * but there is no defined method to pass that information from the PF to the + * VF. This is accomplished during VF initialization by temporarily co-opting + * the yet-to-be-used TDLEN register to have the PF store the ITR shift for + * the VF to retrieve before the VF needs to use the TDLEN register for its + * intended purpose, i.e. before the Tx resources are allocated. + */ #define FM10K_TDLEN_ITR_SCALE_SHIFT9 #define FM10K_TDLEN_ITR_SCALE_MASK 0x0E00 #define FM10K_TDLEN_ITR_SCALE_GEN1 2 diff --git a/drivers/net/fm10k/base/fm10k_vf.c b/drivers/net/fm10k/base/fm10k_vf.c index 9b10ee4..43eb081 100644 --- a/drivers/net/fm10k/base/fm10k_vf.c +++ b/drivers/net/fm10k/base/fm10k_vf.c @@ -74,6 +74,11 @@ STATIC s32 fm10k_stop_hw_vf(struct fm10k_hw *hw) FM10K_WRITE_REG(hw, FM10K_TDBAH(i), bah); FM10K_WRITE_REG(hw, FM10K_RDBAL(i), bal); FM10K_WRITE_REG(hw, FM10K_RDBAH(i), bah); + /* Restore ITR scale in software-defined mechanism in TDLEN +* for next VF initialization. See definition of +* FM10K_TDLEN_ITR_SCALE_SHIFT for more details on the use of +* TDLEN here. +*/ FM10K_WRITE_REG(hw, FM10K_TDLEN(i), tdlen); } @@ -157,6 +162,10 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw) /* fetch default VLAN and ITR scale */ hw->mac.default_vid = (FM10K_READ_REG(hw, FM10K_TXQCTL(0)) & FM10K_TXQCTL_VID_MASK) >> FM10K_TXQCTL_VID_SHIFT; + /* Read the ITR scale from TDLEN. See the definition of +* FM10K_TDLEN_ITR_SCALE_SHIFT for more information about how TDLEN is +* used here. +*/ hw->mac.itr_scale = (FM10K_READ_REG(hw, FM10K_TDLEN(0)) & FM10K_TDLEN_ITR_SCALE_MASK) >> FM10K_TDLEN_ITR_SCALE_SHIFT; -- 1.9.3
[dpdk-dev] [PATCH v2 07/16] fm10k/base: fix checkpatch warning
Cleanup lines over 80 characters. Cleanup useless else, checkpatch warns that else is not generally useful after a break or return. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_mbx.c | 2 +- drivers/net/fm10k/base/fm10k_pf.c | 19 ++- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_mbx.c b/drivers/net/fm10k/base/fm10k_mbx.c index 3c9ab3a..7d03704 100644 --- a/drivers/net/fm10k/base/fm10k_mbx.c +++ b/drivers/net/fm10k/base/fm10k_mbx.c @@ -930,7 +930,7 @@ STATIC void fm10k_mbx_create_disconnect_hdr(struct fm10k_mbx_info *mbx) } /** - * fm10k_mbx_create_fake_disconnect_hdr - Generate a false disconnect mailbox header + * fm10k_mbx_create_fake_disconnect_hdr - Generate a false disconnect mbox hdr * @mbx: pointer to mailbox * * This function creates a fake disconnect header for loading into remote diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 6de679e..3ee88b6 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -1278,8 +1278,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 **results, err = fm10k_iov_select_vid(vf_info, (u16)vid); if (err < 0) return err; - else - vid = err; + + vid = err; /* update VSI info for VF in regards to VLAN table */ err = hw->mac.ops.update_vlan(hw, vid, vf_info->vsi, set); @@ -1304,8 +1304,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 **results, err = fm10k_iov_select_vid(vf_info, vlan); if (err < 0) return err; - else - vlan = (u16)err; + + vlan = (u16)err; /* notify switch of request for new unicast address */ err = hw->mac.ops.update_uc_addr(hw, vf_info->glort, @@ -1330,8 +1330,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 **results, err = fm10k_iov_select_vid(vf_info, vlan); if (err < 0) return err; - else - vlan = (u16)err; + + vlan = (u16)err; /* notify switch of request for new multicast address */ err = hw->mac.ops.update_mc_addr(hw, vf_info->glort, @@ -1500,9 +1500,10 @@ STATIC void fm10k_update_hw_stats_pf(struct fm10k_hw *hw, xec = fm10k_read_hw_stats_32b(hw, FM10K_STATS_XEC, &stats->xec); vlan_drop = fm10k_read_hw_stats_32b(hw, FM10K_STATS_VLAN_DROP, &stats->vlan_drop); - loopback_drop = fm10k_read_hw_stats_32b(hw, - FM10K_STATS_LOOPBACK_DROP, - &stats->loopback_drop); + loopback_drop = + fm10k_read_hw_stats_32b(hw, + FM10K_STATS_LOOPBACK_DROP, + &stats->loopback_drop); nodesc_drop = fm10k_read_hw_stats_32b(hw, FM10K_STATS_NODESC_DROP, &stats->nodesc_drop); -- 1.9.3
[dpdk-dev] [PATCH v2 08/16] fm10k/base: use BIT macro instead of open-coded bit-shifting of 1
The upstream Linux kernel community prefers using the BIT macro over bit-shifting a 1. Similar to how this is handled in the i40e shared code, define a macro for OSes that do not already have it and wrap all that in LINUX_MACROS so that it can be stripped from the Linux driver. The upstream Linux kernel community prefers avoiding CamelCase in variables, function names, etc. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 12 ++-- drivers/net/fm10k/base/fm10k_tlv.c | 24 drivers/net/fm10k/base/fm10k_type.h | 18 -- 3 files changed, 30 insertions(+), 24 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 3ee88b6..7d48210 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -576,8 +576,8 @@ STATIC s32 fm10k_configure_dglort_map_pf(struct fm10k_hw *hw, return FM10K_ERR_PARAM; /* determine count of VSIs and queues */ - queue_count = 1 << (dglort->rss_l + dglort->pc_l); - vsi_count = 1 << (dglort->vsi_l + dglort->queue_l); + queue_count = BIT(dglort->rss_l + dglort->pc_l); + vsi_count = BIT(dglort->vsi_l + dglort->queue_l); glort = dglort->glort; q_idx = dglort->queue_b; @@ -593,8 +593,8 @@ STATIC s32 fm10k_configure_dglort_map_pf(struct fm10k_hw *hw, } /* determine count of PCs and queues */ - queue_count = 1 << (dglort->queue_l + dglort->rss_l + dglort->vsi_l); - pc_count = 1 << dglort->pc_l; + queue_count = BIT(dglort->queue_l + dglort->rss_l + dglort->vsi_l); + pc_count = BIT(dglort->pc_l); /* configure PC for Tx queues */ for (pc = 0; pc < pc_count; pc++) { @@ -1001,7 +1001,7 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw *hw, return FM10K_ERR_PARAM; /* clear event notification of VF FLR */ - FM10K_WRITE_REG(hw, FM10K_PFVFLREC(vf_idx / 32), 1 << (vf_idx % 32)); + FM10K_WRITE_REG(hw, FM10K_PFVFLREC(vf_idx / 32), BIT(vf_idx % 32)); /* force timeout and then disconnect the mailbox */ vf_info->mbx.timeout = 0; @@ -1417,7 +1417,7 @@ s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *hw, u32 **results, mode = fm10k_iov_supported_xcast_mode_pf(vf_info, mode); /* if mode is not currently enabled, enable it */ - if (!(FM10K_VF_FLAG_ENABLED(vf_info) & (1 << mode))) + if (!(FM10K_VF_FLAG_ENABLED(vf_info) & BIT(mode))) fm10k_update_xcast_mode_pf(hw, vf_info->glort, mode); /* swap mode back to a bit flag */ diff --git a/drivers/net/fm10k/base/fm10k_tlv.c b/drivers/net/fm10k/base/fm10k_tlv.c index ade87d1..e6150c1 100644 --- a/drivers/net/fm10k/base/fm10k_tlv.c +++ b/drivers/net/fm10k/base/fm10k_tlv.c @@ -249,7 +249,7 @@ s32 fm10k_tlv_attr_put_value(u32 *msg, u16 attr_id, s64 value, u32 len) attr = &msg[FM10K_TLV_DWORD_LEN(*msg)]; if (len < 4) { - attr[1] = (u32)value & ((0x1ul << (8 * len)) - 1); + attr[1] = (u32)value & (BIT(8 * len) - 1); } else { attr[1] = (u32)value; if (len > 4) @@ -699,29 +699,29 @@ STATIC void fm10k_tlv_msg_test_generate_data(u32 *msg, u32 attr_flags) { DEBUGFUNC("fm10k_tlv_msg_test_generate_data"); - if (attr_flags & (1 << FM10K_TEST_MSG_STRING)) + if (attr_flags & BIT(FM10K_TEST_MSG_STRING)) fm10k_tlv_attr_put_null_string(msg, FM10K_TEST_MSG_STRING, test_str); - if (attr_flags & (1 << FM10K_TEST_MSG_MAC_ADDR)) + if (attr_flags & BIT(FM10K_TEST_MSG_MAC_ADDR)) fm10k_tlv_attr_put_mac_vlan(msg, FM10K_TEST_MSG_MAC_ADDR, test_mac, test_vlan); - if (attr_flags & (1 << FM10K_TEST_MSG_U8)) + if (attr_flags & BIT(FM10K_TEST_MSG_U8)) fm10k_tlv_attr_put_u8(msg, FM10K_TEST_MSG_U8, test_u8); - if (attr_flags & (1 << FM10K_TEST_MSG_U16)) + if (attr_flags & BIT(FM10K_TEST_MSG_U16)) fm10k_tlv_attr_put_u16(msg, FM10K_TEST_MSG_U16, test_u16); - if (attr_flags & (1 << FM10K_TEST_MSG_U32)) + if (attr_flags & BIT(FM10K_TEST_MSG_U32)) fm10k_tlv_attr_put_u32(msg, FM10K_TEST_MSG_U32, test_u32); - if (attr_flags & (1 << FM10K_TEST_MSG_U64)) + if (attr_flags & BIT(FM10K_TEST_MSG_U64)) fm10k_tlv_attr_put_u64(msg, FM10K_TEST_MSG_U64, test_u64); - if (attr_flags & (1 << FM10K_TEST_MSG_S8)) + if (attr_flags & BIT(FM10K_TEST_MSG_S8)) fm10k_tlv_attr_put_s8(msg, FM10K_TEST_MSG_S8, test_s8); - if (attr_flags & (1 << FM10K_TEST_MSG_S16)) + if (attr_flags & BIT(FM10K_TEST_MSG_S16)) fm10k_tlv_attr_put_s16(msg, FM10K_TEST_MSG_S16, test_s16); - if (attr_flags & (1 << FM
[dpdk-dev] [PATCH v2 09/16] fm10k/base: do not use CamelCase
The upstream Linux kernel community prefers avoiding CamelCase in variables, function names, etc. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_type.h | 14 +++--- drivers/net/fm10k/fm10k_ethdev.c| 24 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_type.h b/drivers/net/fm10k/base/fm10k_type.h index 387d25b..c9885a1 100644 --- a/drivers/net/fm10k/base/fm10k_type.h +++ b/drivers/net/fm10k/base/fm10k_type.h @@ -531,13 +531,13 @@ struct fm10k_hw; #endif enum fm10k_int_source { - fm10k_int_Mailbox = 0, - fm10k_int_PCIeFault = 1, - fm10k_int_SwitchUpDown = 2, - fm10k_int_SwitchEvent = 3, - fm10k_int_SRAM = 4, - fm10k_int_VFLR = 5, - fm10k_int_MaxHoldTime = 6, + fm10k_int_mailbox = 0, + fm10k_int_pcie_fault= 1, + fm10k_int_switch_up_down= 2, + fm10k_int_switch_event = 3, + fm10k_int_sram = 4, + fm10k_int_vflr = 5, + fm10k_int_max_hold_time = 6, fm10k_int_sources_max_pf }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index 2c38ce9..a118cf4 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2074,12 +2074,12 @@ fm10k_dev_enable_intr_pf(struct rte_eth_dev *dev) /* Bind all local non-queue interrupt to vector 0 */ int_map |= 0; - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_mailbox), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_pcie_fault), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_up_down), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_event), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_sram), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_vflr), int_map); /* Enable misc causes */ FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_ENABLE(PCA_FAULT) | @@ -2105,12 +2105,12 @@ fm10k_dev_disable_intr_pf(struct rte_eth_dev *dev) int_map |= 0; - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map); - FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_mailbox), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_pcie_fault), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_up_down), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_event), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_sram), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_vflr), int_map); /* Disable misc causes */ FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_DISABLE(PCA_FAULT) | -- 1.9.3
[dpdk-dev] [PATCH v2 10/16] fm10k/base: use memcpy for mac addr copy
Use memcpy instead of copying MAC address byte-by-byte. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 7d48210..a1469aa 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -300,7 +300,6 @@ STATIC s32 fm10k_read_mac_addr_pf(struct fm10k_hw *hw) { u8 perm_addr[ETH_ALEN]; u32 serial_num; - int i; DEBUGFUNC("fm10k_read_mac_addr_pf"); @@ -324,10 +323,8 @@ STATIC s32 fm10k_read_mac_addr_pf(struct fm10k_hw *hw) perm_addr[4] = (u8)(serial_num >> 8); perm_addr[5] = (u8)(serial_num); - for (i = 0; i < ETH_ALEN; i++) { - hw->mac.perm_addr[i] = perm_addr[i]; - hw->mac.addr[i] = perm_addr[i]; - } + memcpy(hw->mac.perm_addr, perm_addr, ETH_ALEN); + memcpy(hw->mac.addr, perm_addr, ETH_ALEN); return FM10K_SUCCESS; } -- 1.9.3
[dpdk-dev] [PATCH v2 11/16] fm10k/base: allow removal of is_slot_appropriate function
The Linux Kernel provides the OS a call "pcie_get_minimum_link" which can crawl the PCIe tree and determine the actual minimum link speed of a device which is a more general check than provided by is_slot_appropriate. Thus, the upstream driver does not use or want the is_slot_appropriate function call. Add a NO_IS_SLOT_APPROPRIATE_CHECK definition which can be defined during strip process to remove the code. If left undefined (the default) then the code will all be active and no driver changes should be necessary. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_api.c | 2 ++ drivers/net/fm10k/base/fm10k_api.h | 2 ++ drivers/net/fm10k/base/fm10k_pf.c | 4 drivers/net/fm10k/base/fm10k_type.h | 2 ++ drivers/net/fm10k/base/fm10k_vf.c | 4 5 files changed, 14 insertions(+) diff --git a/drivers/net/fm10k/base/fm10k_api.c b/drivers/net/fm10k/base/fm10k_api.c index eb5bdaa..c49d20d 100644 --- a/drivers/net/fm10k/base/fm10k_api.c +++ b/drivers/net/fm10k/base/fm10k_api.c @@ -181,6 +181,7 @@ s32 fm10k_get_bus_info(struct fm10k_hw *hw) FM10K_NOT_IMPLEMENTED); } +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK /** * fm10k_is_slot_appropriate - Indicate appropriate slot for this SKU * @hw: pointer to hardware structure @@ -195,6 +196,7 @@ bool fm10k_is_slot_appropriate(struct fm10k_hw *hw) return true; } +#endif /** * fm10k_update_vlan - Clear VLAN ID to VLAN filter table * @hw: pointer to hardware structure diff --git a/drivers/net/fm10k/base/fm10k_api.h b/drivers/net/fm10k/base/fm10k_api.h index 113aef5..2ab3149 100644 --- a/drivers/net/fm10k/base/fm10k_api.h +++ b/drivers/net/fm10k/base/fm10k_api.h @@ -44,7 +44,9 @@ s32 fm10k_stop_hw(struct fm10k_hw *hw); s32 fm10k_start_hw(struct fm10k_hw *hw); s32 fm10k_init_shared_code(struct fm10k_hw *hw); s32 fm10k_get_bus_info(struct fm10k_hw *hw); +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK bool fm10k_is_slot_appropriate(struct fm10k_hw *hw); +#endif s32 fm10k_update_vlan(struct fm10k_hw *hw, u32 vid, u8 idx, bool set); s32 fm10k_read_mac_addr(struct fm10k_hw *hw); void fm10k_update_hw_stats(struct fm10k_hw *hw, struct fm10k_hw_stats *stats); diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index a1469aa..f5cbda4 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -216,6 +216,7 @@ STATIC s32 fm10k_init_hw_pf(struct fm10k_hw *hw) return FM10K_SUCCESS; } +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK /** * fm10k_is_slot_appropriate_pf - Indicate appropriate slot for this SKU * @hw: pointer to hardware structure @@ -231,6 +232,7 @@ STATIC bool fm10k_is_slot_appropriate_pf(struct fm10k_hw *hw) (hw->bus.width == hw->bus_caps.width); } +#endif /** * fm10k_update_vlan_pf - Update status of VLAN ID in VLAN filter table * @hw: pointer to hardware structure @@ -2064,7 +2066,9 @@ s32 fm10k_init_ops_pf(struct fm10k_hw *hw) mac->ops.init_hw = &fm10k_init_hw_pf; mac->ops.start_hw = &fm10k_start_hw_generic; mac->ops.stop_hw = &fm10k_stop_hw_generic; +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK mac->ops.is_slot_appropriate = &fm10k_is_slot_appropriate_pf; +#endif mac->ops.update_vlan = &fm10k_update_vlan_pf; mac->ops.read_mac_addr = &fm10k_read_mac_addr_pf; mac->ops.update_uc_addr = &fm10k_update_uc_addr_pf; diff --git a/drivers/net/fm10k/base/fm10k_type.h b/drivers/net/fm10k/base/fm10k_type.h index c9885a1..ba0a184 100644 --- a/drivers/net/fm10k/base/fm10k_type.h +++ b/drivers/net/fm10k/base/fm10k_type.h @@ -679,7 +679,9 @@ struct fm10k_mac_ops { s32 (*stop_hw)(struct fm10k_hw *); s32 (*get_bus_info)(struct fm10k_hw *); s32 (*get_host_state)(struct fm10k_hw *, bool *); +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK bool (*is_slot_appropriate)(struct fm10k_hw *); +#endif s32 (*update_vlan)(struct fm10k_hw *, u32, u8, bool); s32 (*read_mac_addr)(struct fm10k_hw *); s32 (*update_uc_addr)(struct fm10k_hw *, u16, const u8 *, diff --git a/drivers/net/fm10k/base/fm10k_vf.c b/drivers/net/fm10k/base/fm10k_vf.c index 43eb081..efbdbd1 100644 --- a/drivers/net/fm10k/base/fm10k_vf.c +++ b/drivers/net/fm10k/base/fm10k_vf.c @@ -178,6 +178,7 @@ reset_max_queues: return err; } +#ifndef NO_IS_SLOT_APPROPRIATE_CHECK /** * fm10k_is_slot_appropriate_vf - Indicate appropriate slot for this SKU * @hw: pointer to hardware structure @@ -194,6 +195,7 @@ STATIC bool fm10k_is_slot_appropriate_vf(struct fm10k_hw *hw) return TRUE; } +#endif /* This structure defines the attibutes to be parsed below */ const struct fm10k_tlv_attr fm10k_mac_vlan_msg_attr[] = { FM10K_TLV_ATTR_U32(FM10K_MAC_VLAN_MSG_VLAN), @@ -648,7 +650,9 @@ s32 fm10k_init_ops_vf(struct fm10k_hw *hw) mac->ops.init_hw = &fm10k_init_hw_vf; mac->ops.start_hw = &fm10k_start_hw_generic; mac->ops.stop_hw = &fm10k_stop_hw
[dpdk-dev] [PATCH v2 12/16] fm10k/base: consistently use VLAN ID when referencing vid variables
The vid variable name is shorthand for VLAN ID, so we should use this in comments explaining what is happening. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index f5cbda4..716d7f1 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -970,7 +970,7 @@ err_out: txqctl |= (vf_idx << FM10K_TXQCTL_TC_SHIFT) | FM10K_TXQCTL_VF | vf_idx; - /* assign VID */ + /* assign VLAN ID */ for (i = 0; i < queues_per_pool; i++) FM10K_WRITE_REG(hw, FM10K_TXQCTL(vf_q_idx + i), txqctl); @@ -1215,12 +1215,12 @@ s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *hw, u32 **results, } /** - * fm10k_iov_select_vid - Select correct default vid + * fm10k_iov_select_vid - Select correct default VLAN ID * @hw: Pointer to hardware structure - * @vid: vid to correct + * @vid: VLAN ID to correct * - * Will report an error if vid is out of range. For vid = 0, it will return - * either the pf_vid or sw_vid depending on which one is set. + * Will report an error if the VLAN ID is out of range. For VID = 0, it will + * return either the pf_vid or sw_vid depending on which one is set. */ STATIC s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid) { @@ -1783,7 +1783,7 @@ static s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, u32 **results, if (!fm10k_glort_valid_pf(hw, glort)) return FM10K_ERR_PARAM; - /* verify VID is valid */ + /* verify VLAN ID is valid */ if (pvid >= FM10K_VLAN_TABLE_VID_MAX) return FM10K_ERR_PARAM; -- 1.9.3
[dpdk-dev] [PATCH v2 13/16] fm10k/base: fix comment per upstream review changes
The comment here was changed during review of upstream patch, and the new wording is slightly more clear. Re-write the comment in SHARED code based on this new wording. Fix a number of mailbox comment issues with function header comments, lower-case acronyms (i.e. FIFO, TLV), incorrect function names in DEBUGFUNC(), duplicate comments and a stubbed-out header comment for fm10k_sm_mbx_init. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_mbx.c | 61 ++ drivers/net/fm10k/base/fm10k_mbx.h | 4 +-- drivers/net/fm10k/base/fm10k_pf.c | 12 drivers/net/fm10k/base/fm10k_tlv.h | 4 +-- 4 files changed, 46 insertions(+), 35 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_mbx.c b/drivers/net/fm10k/base/fm10k_mbx.c index 7d03704..2e70434 100644 --- a/drivers/net/fm10k/base/fm10k_mbx.c +++ b/drivers/net/fm10k/base/fm10k_mbx.c @@ -70,7 +70,7 @@ STATIC u16 fm10k_fifo_unused(struct fm10k_mbx_fifo *fifo) } /** - * fm10k_fifo_empty - Test to verify if fifo is empty + * fm10k_fifo_empty - Test to verify if FIFO is empty * @fifo: pointer to FIFO * * This function returns true if the FIFO is empty, else false @@ -85,7 +85,7 @@ STATIC bool fm10k_fifo_empty(struct fm10k_mbx_fifo *fifo) * @fifo: pointer to FIFO * @offset: offset to add to head * - * This function returns the indices into the fifo based on head + offset + * This function returns the indices into the FIFO based on head + offset **/ STATIC u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo *fifo, u16 offset) { @@ -97,7 +97,7 @@ STATIC u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo *fifo, u16 offset) * @fifo: pointer to FIFO * @offset: offset to add to tail * - * This function returns the indices into the fifo based on tail + offset + * This function returns the indices into the FIFO based on tail + offset **/ STATIC u16 fm10k_fifo_tail_offset(struct fm10k_mbx_fifo *fifo, u16 offset) { @@ -173,7 +173,7 @@ STATIC u16 fm10k_mbx_index_len(struct fm10k_mbx_info *mbx, u16 head, u16 tail) /** * fm10k_mbx_tail_add - Determine new tail value with added offset * @mbx: pointer to mailbox - * @offset: length to add to head offset + * @offset: length to add to tail offset * * This function takes the local tail index and recomputes it for * a given length added as an offset. @@ -189,7 +189,7 @@ STATIC u16 fm10k_mbx_tail_add(struct fm10k_mbx_info *mbx, u16 offset) /** * fm10k_mbx_tail_sub - Determine new tail value with subtracted offset * @mbx: pointer to mailbox - * @offset: length to add to head offset + * @offset: length to add to tail offset * * This function takes the local tail index and recomputes it for * a given length added as an offset. @@ -253,7 +253,7 @@ STATIC u16 fm10k_mbx_pushed_tail_len(struct fm10k_mbx_info *mbx) } /** - * fm10k_fifo_write_copy - pulls data off of msg and places it in fifo + * fm10k_fifo_write_copy - pulls data off of msg and places it in FIFO * @fifo: pointer to FIFO * @msg: message array to populate * @tail_offset: additional offset to add to tail pointer @@ -331,7 +331,7 @@ STATIC u16 fm10k_mbx_validate_msg_size(struct fm10k_mbx_info *mbx, u16 len) u16 total_len = 0, msg_len; u32 *msg; - DEBUGFUNC("fm10k_mbx_validate_msg"); + DEBUGFUNC("fm10k_mbx_validate_msg_size"); /* length should include previous amounts pushed */ len += mbx->pushed; @@ -353,6 +353,7 @@ STATIC u16 fm10k_mbx_validate_msg_size(struct fm10k_mbx_info *mbx, u16 len) /** * fm10k_mbx_write_copy - pulls data off of Tx FIFO and places it in mbmem + * @hw: pointer to hardware structure * @mbx: pointer to mailbox * * This function will take a section of the Tx FIFO and copy it into the @@ -734,7 +735,7 @@ STATIC bool fm10k_mbx_tx_complete(struct fm10k_mbx_info *mbx) * @hw: pointer to hardware structure * @mbx: pointer to mailbox * - * This function dequeues messages and hands them off to the tlv parser. + * This function dequeues messages and hands them off to the TLV parser. * It will return the number of messages processed when called. **/ STATIC u16 fm10k_mbx_dequeue_rx(struct fm10k_hw *hw, @@ -951,7 +952,7 @@ STATIC void fm10k_mbx_create_fake_disconnect_hdr(struct fm10k_mbx_info *mbx) } /** - * fm10k_mbx_create_error_msg - Generate a error message + * fm10k_mbx_create_error_msg - Generate an error message * @mbx: pointer to mailbox * @err: local error encountered * @@ -984,7 +985,6 @@ STATIC void fm10k_mbx_create_error_msg(struct fm10k_mbx_info *mbx, s32 err) /** * fm10k_mbx_validate_msg_hdr - Validate common fields in the message header * @mbx: pointer to mailbox - * @msg: message array to read * * This function will parse up the fields in the mailbox header and return * an error if the header contains any of a number of invalid configurations @@ -1050,11 +1050,12 @@ STATIC s32 fm10k_mbx_validate_msg_hdr(st
[dpdk-dev] [PATCH v2 14/16] fm10k/base: TLV structures must be 4byte aligned, not 1byte aligned
Per comments from an upstream patch, and looking at how TLV LE_STRUCT code works, we actually want these structures to be 4byte aligned, not 1byte aligned. In practice, 1byte alignment has worked so far because all our structures end up being a multiple of 4. But if a future TLV structure were added that had a u8 or similar sticking on the end things would break. Fix this by using 4byte alignment which will prevent the TLV LE_STRUCT code from breaking. Update the comment explaining that we need 4byte alignment of our structures. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.h b/drivers/net/fm10k/base/fm10k_pf.h index 92e2962..ee8527a 100644 --- a/drivers/net/fm10k/base/fm10k_pf.h +++ b/drivers/net/fm10k/base/fm10k_pf.h @@ -91,14 +91,14 @@ enum fm10k_pf_tlv_attr_id_v1 { #define FM10K_MSG_UPDATE_PVID_PVID_SHIFT 16 #define FM10K_MSG_UPDATE_PVID_PVID_SIZE16 -/* The following data structures are overlayed specifically to TLV mailbox - * messages, and must not have gaps between their values. They must line up - * correctly to the TLV definition. +/* The following data structures are overlayed directly onto TLV mailbox + * messages, and must not break 4 byte alignment. Ensure the structures line + * up correctly as per their TLV definition. */ #ifdef C99 -#pragma pack(push, 1) +#pragma pack(push, 4) #else -#pragma pack(1) +#pragma pack(4) #endif /* C99 */ struct fm10k_mac_update { -- 1.9.3
[dpdk-dev] [PATCH v2 16/16] fm10k/base: minor cleanups
Some cleanups to better reflect the code that was actually pushed out to the upstream Linux community. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_mbx.h | 7 -- drivers/net/fm10k/base/fm10k_pf.h | 4 -- drivers/net/fm10k/base/fm10k_type.h | 132 3 files changed, 143 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_mbx.h b/drivers/net/fm10k/base/fm10k_mbx.h index e642c2f..edc57df 100644 --- a/drivers/net/fm10k/base/fm10k_mbx.h +++ b/drivers/net/fm10k/base/fm10k_mbx.h @@ -48,7 +48,6 @@ struct fm10k_mbx_info; /* XOR provides means of switching from Tx to Rx FIFO */ #define FM10K_MBMEM_PF_XOR (FM10K_MBMEM_SM(0) ^ FM10K_MBMEM_PF(0)) #define FM10K_MBX(_n) ((_n) + 0x18800) -#define FM10K_MBX_OWNER0x0001 #define FM10K_MBX_REQ 0x0002 #define FM10K_MBX_ACK 0x0004 #define FM10K_MBX_REQ_INTERRUPT0x0008 @@ -213,7 +212,6 @@ enum fm10k_msg_type { /* version number for switch manager mailboxes */ #define FM10K_SM_MBX_VERSION 1 #define FM10K_SM_MBX_FIFO_LEN (FM10K_MBMEM_PF_XOR - 1) -#define FM10K_SM_MBX_FIFO_HDR_LEN 1 /* offsets shared between all SM FIFO headers */ #define FM10K_MSG_SM_TAIL_SHIFT0 @@ -233,18 +231,13 @@ enum fm10k_msg_type { */ #define FM10K_MBX_ERR(_n) ((_n) - 512) #define FM10K_MBX_ERR_NO_MBX FM10K_MBX_ERR(0x01) -#define FM10K_MBX_ERR_NO_MSG FM10K_MBX_ERR(0x02) #define FM10K_MBX_ERR_NO_SPACE FM10K_MBX_ERR(0x03) -#define FM10K_MBX_ERR_LOCK FM10K_MBX_ERR(0x04) #define FM10K_MBX_ERR_TAIL FM10K_MBX_ERR(0x05) #define FM10K_MBX_ERR_HEAD FM10K_MBX_ERR(0x06) -#define FM10K_MBX_ERR_DST FM10K_MBX_ERR(0x07) #define FM10K_MBX_ERR_SRC FM10K_MBX_ERR(0x08) #define FM10K_MBX_ERR_TYPE FM10K_MBX_ERR(0x09) -#define FM10K_MBX_ERR_LEN FM10K_MBX_ERR(0x0A) #define FM10K_MBX_ERR_SIZE FM10K_MBX_ERR(0x0B) #define FM10K_MBX_ERR_BUSY FM10K_MBX_ERR(0x0C) -#define FM10K_MBX_ERR_VALUEFM10K_MBX_ERR(0x0D) #define FM10K_MBX_ERR_RSVD0FM10K_MBX_ERR(0x0E) #define FM10K_MBX_ERR_CRC FM10K_MBX_ERR(0x0F) diff --git a/drivers/net/fm10k/base/fm10k_pf.h b/drivers/net/fm10k/base/fm10k_pf.h index ee8527a..c84b1bc 100644 --- a/drivers/net/fm10k/base/fm10k_pf.h +++ b/drivers/net/fm10k/base/fm10k_pf.h @@ -140,10 +140,6 @@ struct fm10k_swapi_1588_clock_owner { #pragma pack() #endif /* C99 */ -#define FM10K_PF_MSG_LPORT_CREATE_HANDLER(func) \ - FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_CREATE, NULL, func) -#define FM10K_PF_MSG_LPORT_DELETE_HANDLER(func) \ - FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_DELETE, NULL, func) s32 fm10k_msg_lport_map_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *); extern const struct fm10k_tlv_attr fm10k_lport_map_msg_attr[]; #define FM10K_PF_MSG_LPORT_MAP_HANDLER(func) \ diff --git a/drivers/net/fm10k/base/fm10k_type.h b/drivers/net/fm10k/base/fm10k_type.h index ba0a184..3fc8f13 100644 --- a/drivers/net/fm10k/base/fm10k_type.h +++ b/drivers/net/fm10k/base/fm10k_type.h @@ -40,7 +40,6 @@ struct fm10k_hw; #include "fm10k_osdep.h" #include "fm10k_mbx.h" -#define FM10K_INTEL_VENDOR_ID 0x8086 #define FM10K_DEV_ID_PF0x15A4 #define FM10K_DEV_ID_VF0x15A5 #ifdef BOULDER_RAPIDS_HW @@ -121,28 +120,16 @@ struct fm10k_hw; #define FM10K_CTRL_BAR4_ALLOWED0x0004 #define FM10K_CTRL_EXT 0x0001 -#define FM10K_CTRL_EXT_NS_DIS 0x0001 -#define FM10K_CTRL_EXT_RO_DIS 0x0002 -#define FM10K_CTRL_EXT_SWITCH_LOOPBACK 0x0004 -#define FM10K_EXVET0x0002 -#define FM10K_EXVET_ETHERTYPE_MASK 0x00FF -#define FM10K_EXVET_TAG_SIZE_SHIFT 16 -#define FM10K_EXVET_AFTER_VLAN 0x0004 #define FM10K_GCR 0x0003 -#define FM10K_FACTPS 0x0004 #define FM10K_GCR_EXT 0x0005 /* Interrupt control registers */ #define FM10K_EICR 0x0006 -#define FM10K_EICR_PCA_FAULT 0x0001 -#define FM10K_EICR_THI_FAULT 0x0004 -#define FM10K_EICR_FUM_FAULT 0x0020 #define FM10K_EICR_FAULT_MASK 0x003F #define FM10K_EICR_MAILBOX 0x0040 #define FM10K_EICR_SWITCHREADY 0x0080 #define FM10K_EICR_SWITCHNOTREADY 0x0100 #define FM10K_EICR_SWITCHINTERRUPT 0x0200 -#define FM10K_EICR_SRAMERROR 0x0400 #define FM10K_EICR_VFLR0x0800 #define FM10K_EICR_MAXHOLDTIME 0x1000 #define FM10K_EIMR 0x0007 @@ -196,7 +183,6 @@ struct fm10k_hw
[dpdk-dev] [PATCH v2 15/16] fm10k/base: move constants to the right of binary operators
The upstream Linux kernel community prefers constants are to the right of binary operators. Signed-off-by: Wang Xiao W --- drivers/net/fm10k/base/fm10k_pf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/fm10k/base/fm10k_pf.c b/drivers/net/fm10k/base/fm10k_pf.c index 456fe64..105babf 100644 --- a/drivers/net/fm10k/base/fm10k_pf.c +++ b/drivers/net/fm10k/base/fm10k_pf.c @@ -759,8 +759,8 @@ STATIC s32 fm10k_iov_assign_resources_pf(struct fm10k_hw *hw, u16 num_vfs, FM10K_RXDCTL_WRITE_BACK_MIN_DELAY | FM10K_RXDCTL_DROP_ON_EMPTY); FM10K_WRITE_REG(hw, FM10K_RXQCTL(vf_q_idx), - FM10K_RXQCTL_VF | - (i << FM10K_RXQCTL_VF_SHIFT)); + (i << FM10K_RXQCTL_VF_SHIFT) | + FM10K_RXQCTL_VF); /* map queue pair to VF */ FM10K_WRITE_REG(hw, FM10K_TQMAP(qmap_idx), vf_q_idx); @@ -1035,7 +1035,7 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw *hw, txqctl = ((u32)vf_vid << FM10K_TXQCTL_VID_SHIFT) | (vf_idx << FM10K_TXQCTL_TC_SHIFT) | FM10K_TXQCTL_VF | vf_idx; - rxqctl = FM10K_RXQCTL_VF | (vf_idx << FM10K_RXQCTL_VF_SHIFT); + rxqctl = (vf_idx << FM10K_RXQCTL_VF_SHIFT) | FM10K_RXQCTL_VF; /* stop further DMA and reset queue ownership back to VF */ for (i = vf_q_idx; i < (queues_per_pool + vf_q_idx); i++) { -- 1.9.3
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote: >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: >>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { >>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; >>> + } else { >>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; >>> + } >> Thanks! >> We might simplify this further. Just reset the first two fields flags >> and gso_type. > What's this "simplification" for? Don't even to say that we will add > TSO support, which modifies few more files, such as csum_start: reseting > the first two fields only is wrong here. I know TSO before commenting, but at least in this implementation and this specific patch, i guess zeroing two fields are enough. What is wrong resetting only two fields? > > --yliu >
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote: > On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > > On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote: > >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > >>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > >>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > >>> + } else { > >>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > >>> + } > >> Thanks! > >> We might simplify this further. Just reset the first two fields flags > >> and gso_type. > > What's this "simplification" for? Don't even to say that we will add > > TSO support, which modifies few more files, such as csum_start: reseting > > the first two fields only is wrong here. > > I know TSO before commenting, but at least in this implementation and > this specific patch, i guess zeroing two fields are enough. > > What is wrong resetting only two fields? I then have to ask "What is the benifit of resetting only two fields"? If doing so, we have to change it back for TSO. My proposal requires no extra change when adding TSO support. --yliu
[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst
On 1/27/2016 11:26 AM, Yuanhan Liu wrote: > On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote: >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: >>> Signed-off-by: Yuanhan Liu >>> --- >>> lib/librte_vhost/vhost_rxtx.c | 287 >>> +- >>> 1 file changed, 113 insertions(+), 174 deletions(-) >> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us > I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not > to do that, unless it has a good reason. > >> processing descriptors in a burst way in future. > So, do you have a plan? I think it is OK. If we need unroll in future, we could do that then. I am open to this. Just my preference. I understand that wrapping makes code more readable. > > --yliu >
[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst
On Wed, Jan 27, 2016 at 06:12:22AM +, Xie, Huawei wrote: > On 1/27/2016 11:26 AM, Yuanhan Liu wrote: > > On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote: > >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > >>> Signed-off-by: Yuanhan Liu > >>> --- > >>> lib/librte_vhost/vhost_rxtx.c | 287 > >>> +- > >>> 1 file changed, 113 insertions(+), 174 deletions(-) > >> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us > > I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not > > to do that, unless it has a good reason. > > > >> processing descriptors in a burst way in future. > > So, do you have a plan? > > I think it is OK. If we need unroll in future, we could do that then. I > am open to this. Just my preference. I understand that wrapping makes > code more readable. Okay, let's consider it then: unroll would be easy after all. --yliu
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On 1/27/2016 2:02 PM, Yuanhan Liu wrote: > On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote: >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote: >>> On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote: On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > + } else { > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > + } Thanks! We might simplify this further. Just reset the first two fields flags and gso_type. >>> What's this "simplification" for? Don't even to say that we will add >>> TSO support, which modifies few more files, such as csum_start: reseting >>> the first two fields only is wrong here. >> I know TSO before commenting, but at least in this implementation and >> this specific patch, i guess zeroing two fields are enough. >> >> What is wrong resetting only two fields? > I then have to ask "What is the benifit of resetting only two fields"? > If doing so, we have to change it back for TSO. My proposal requires no > extra change when adding TSO support. ? Benefit is we save four unnecessary stores. > > --yliu >
[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy
On Wed, Jan 27, 2016 at 06:16:37AM +, Xie, Huawei wrote: > On 1/27/2016 2:02 PM, Yuanhan Liu wrote: > > On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote: > >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > >>> On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote: > On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > > + *(struct virtio_net_hdr_mrg_rxbuf > > *)(uintptr_t)desc_addr = hdr; > > + } else { > > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = > > hdr.hdr; > > + } > Thanks! > We might simplify this further. Just reset the first two fields flags > and gso_type. > >>> What's this "simplification" for? Don't even to say that we will add > >>> TSO support, which modifies few more files, such as csum_start: reseting > >>> the first two fields only is wrong here. > >> I know TSO before commenting, but at least in this implementation and > >> this specific patch, i guess zeroing two fields are enough. > >> > >> What is wrong resetting only two fields? > > I then have to ask "What is the benifit of resetting only two fields"? > > If doing so, we have to change it back for TSO. My proposal requires no > > extra change when adding TSO support. > > ? Benefit is we save four unnecessary stores. Hmm..., the hdr size is 12 bytes at most. I mean, does it really matter, coping 3 bytes, or coping 12 bytes in a row? --yliu
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
Laurent, have you resolved this problem? I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced the same problem as you. Here is parts of my log and it says that PMD cannot enable RX queue for my NIC. I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling rte_eth_rx_queue_setup(). (i.e. 'null' parameter provides the default rx_conf value.) Thanks. APP: initialising port 0 ... PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040 sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580 PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940 hw_ring=0x7f5f27247980 dma_addr=0x41f447980 PMD: ixgbe_set_tx_function(): Using simple tx code path PMD: ixgbe_set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0 hw_ring=0x7f5f27235800 dma_addr=0x41f435800 PMD: ixgbe_set_tx_function(): Using simple tx code path PMD: ixgbe_set_tx_function(): Vector tx enabled. PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=0). *PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0* APP: port 0 has started APP: port 0 has entered in promiscuous mode APP: port 0 initialization is done. KNI: pci: 09:00:00 8086:10c7 APP: kni allocation is done for port 0. APP: initialising port 1 ... PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0 sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300 PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0 hw_ring=0x7f5f27212700 dma_addr=0x41f412700 PMD: ixgbe_set_tx_function(): Using simple tx code path PMD: ixgbe_set_tx_function(): Vector tx enabled. PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540 hw_ring=0x7f5f27200580 dma_addr=0x41f400580 PMD: ixgbe_set_tx_function(): Using simple tx code path PMD: ixgbe_set_tx_function(): Vector tx enabled. PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=1). *PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0* APP: port 1 has started APP: port 1 has entered in promiscuous mode APP: port 1 initialization is done. KNI: pci: 0a:00:00 8086:10c7 APP: kni allocation is done for port 1. checking link status .done Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed 1 Mbps - full-duplex On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith wrote: > On 12/27/15, 2:09 PM, "Laurent GUERBY" wrote: > > >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote: > >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY" < > dev-bounces at dpdk.org on behalf of laurent at guerby.net> wrote: > >> > >> >Hi, > >> > > >> >I reported today an issue when using Pktgen-DPDK: > >> >https://github.com/pktgen/Pktgen-DPDK/issues/52 > >> > > >> >But I think it's more in DPDK than pktgen > >> > > >> >two identical machines with SFP+ DA cable between them > >> >DPDK 2.2.0 from tarball > >> >Pktgen-DPDK from git > >> >two identical machines: > >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS > >> >ASUS P8H67-M PRO BIOS 3904 (latest available) > >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port > >> >Network Connection (rev 01) > >> >01:00.0 0200: 8086:10f1 (rev 01) > >> >Subsystem: 8086:a21f > >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4 > >> > > >> >When launching pktgen even with no TX asked the Errors RX counters > keeps > >> >going up by about 7.4 millions per second: > >> > > >> >Errors Rx/Tx : 7471857054/0 > >> > > >> >In the log I get "Could not enable Rx Queue", might be the > >> >source of the issue? > >> > > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0 > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 1 > >> > > >> >When sending traffic single UDP src/dst/IP/MAC the setup > >> >reaches 14204188 pps 64 bytes, the error counter is also > >> >increasing. > >> > > >> >Any idea what to look for? > >> > >> One more suggestion is to run test_pmd on one machine and something > >> like iperf on the other to verify the DPDK is working correct, which I > >> assume will be true. Not sure the RX errors are reported in the > >> test_pmd or you could use the l3fwd application too. > > > >Ok, I will check the test_pmd documentation and try to do this test: I'm > >just starting on DPDK :). > > > >> Please also send me the 'lspci | grep Ethernet? output. > > > >I included one line in my original email above (plus extract of lspci > >-vn), here is the full output of the command: > > > >01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF > >Dual Port Network Connection (rev 01) > >01:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF > >Dual Port Network Connection (rev 01) > >05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > >RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) > > > >(The realtek is used only for internet connectivity). > > > >> Also send me the command line. > > > >On the first machine
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
On Wed, 2016-01-27 at 15:50 +0900, Moon-Sang Lee wrote: > > > Laurent, have you resolved this problem? > I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced > the same problem as you. > Here is parts of my log and it says that PMD cannot enable RX queue > for my NIC. > I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling > rte_eth_rx_queue_setup(). > (i.e. 'null' parameter provides the default rx_conf value.) Hi, I had to reuse my DPDK machines for another task, I will go back to it after FOSDEM. The error you get is the same as mine. Sincerely, Laurent > > Thanks. > > > > > > APP: initialising port 0 ... > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040 > sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580 > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940 > hw_ring=0x7f5f27247980 dma_addr=0x41f447980 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0 > hw_ring=0x7f5f27235800 dma_addr=0x41f435800 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX > burst size no less than 4 (port=0). > PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0 > APP: port 0 has started > APP: port 0 has entered in promiscuous mode > APP: port 0 initialization is done. > KNI: pci: 09:00:00 8086:10c7 > APP: kni allocation is done for port 0. > APP: initialising port 1 ... > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0 > sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300 > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0 > hw_ring=0x7f5f27212700 dma_addr=0x41f412700 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540 > hw_ring=0x7f5f27200580 dma_addr=0x41f400580 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX > burst size no less than 4 (port=1). > PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0 > APP: port 1 has started > APP: port 1 has entered in promiscuous mode > APP: port 1 initialization is done. > KNI: pci: 0a:00:00 8086:10c7 > APP: kni allocation is done for port 1. > > > checking link status > .done > Port 0 Link Up - speed 1 Mbps - full-duplex > Port 1 Link Up - speed 1 Mbps - full-duplex > > > > On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith > wrote: > On 12/27/15, 2:09 PM, "Laurent GUERBY" > wrote: > > >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote: > >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY" > wrote: > >> > >> >Hi, > >> > > >> >I reported today an issue when using Pktgen-DPDK: > >> >https://github.com/pktgen/Pktgen-DPDK/issues/52 > >> > > >> >But I think it's more in DPDK than pktgen > >> > > >> >two identical machines with SFP+ DA cable between them > >> >DPDK 2.2.0 from tarball > >> >Pktgen-DPDK from git > >> >two identical machines: > >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS > >> >ASUS P8H67-M PRO BIOS 3904 (latest available) > >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit > AF Dual Port > >> >Network Connection (rev 01) > >> >01:00.0 0200: 8086:10f1 (rev 01) > >> >Subsystem: 8086:a21f > >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4 > >> > > >> >When launching pktgen even with no TX asked the Errors RX > counters keeps > >> >going up by about 7.4 millions per second: > >> > > >> >Errors Rx/Tx : 7471857054/0 > >> > > >> >In the log I get "Could not enable Rx Queue", might be the > >> >source of the issue? > >> > > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue > 0 > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue > 1 > >> > > >> >When sending traffic single UDP src/dst/IP/MAC the setup > >> >reaches 14204188 pps 64 bytes, the error counter is also > >> >increasing. > >> > > >> >Any idea what to look for? > >> > >> One more suggestion is to run test_pmd on one machine and > something > >> like iperf on the other to verify the DPDK is working > correct, which I > >> assume will be true. Not sure the RX errors are reported in > the > >> test_pmd or you could use the l3fwd application too. > > > >Ok, I will check the test_pmd documentat
[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G
Moon-Sang Were you using pktgen or else application? Could you help to share with me the detailed steps of your reproducing that issue? We will find time on that soon later. Thanks! Regards, Helin -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Laurent GUERBY Sent: Wednesday, January 27, 2016 3:16 PM To: Moon-Sang Lee Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G On Wed, 2016-01-27 at 15:50 +0900, Moon-Sang Lee wrote: > > > Laurent, have you resolved this problem? > I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced > the same problem as you. > Here is parts of my log and it says that PMD cannot enable RX queue > for my NIC. > I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling > rte_eth_rx_queue_setup(). > (i.e. 'null' parameter provides the default rx_conf value.) Hi, I had to reuse my DPDK machines for another task, I will go back to it after FOSDEM. The error you get is the same as mine. Sincerely, Laurent > > Thanks. > > > > > > APP: initialising port 0 ... > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040 > sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580 > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940 > hw_ring=0x7f5f27247980 dma_addr=0x41f447980 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0 > hw_ring=0x7f5f27235800 dma_addr=0x41f435800 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX > burst size no less than 4 (port=0). > PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0 > APP: port 0 has started > APP: port 0 has entered in promiscuous mode > APP: port 0 initialization is done. > KNI: pci: 09:00:00 8086:10c7 > APP: kni allocation is done for port 0. > APP: initialising port 1 ... > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0 > sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300 > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0 > hw_ring=0x7f5f27212700 dma_addr=0x41f412700 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540 > hw_ring=0x7f5f27200580 dma_addr=0x41f400580 > PMD: ixgbe_set_tx_function(): Using simple tx code path > PMD: ixgbe_set_tx_function(): Vector tx enabled. > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX > burst size no less than 4 (port=1). > PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0 > APP: port 1 has started > APP: port 1 has entered in promiscuous mode > APP: port 1 initialization is done. > KNI: pci: 0a:00:00 8086:10c7 > APP: kni allocation is done for port 1. > > > checking link status > .done > Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed > 1 Mbps - full-duplex > > > > On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith > wrote: > On 12/27/15, 2:09 PM, "Laurent GUERBY" > wrote: > > >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote: > >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY" > wrote: > >> > >> >Hi, > >> > > >> >I reported today an issue when using Pktgen-DPDK: > >> >https://github.com/pktgen/Pktgen-DPDK/issues/52 > >> > > >> >But I think it's more in DPDK than pktgen > >> > > >> >two identical machines with SFP+ DA cable between them > >> >DPDK 2.2.0 from tarball > >> >Pktgen-DPDK from git > >> >two identical machines: > >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS > >> >ASUS P8H67-M PRO BIOS 3904 (latest available) > >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit > AF Dual Port > >> >Network Connection (rev 01) > >> >01:00.0 0200: 8086:10f1 (rev 01) > >> >Subsystem: 8086:a21f > >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4 > >> > > >> >When launching pktgen even with no TX asked the Errors RX > counters keeps > >> >going up by about 7.4 millions per second: > >> > > >> >Errors Rx/Tx : 7471857054/0 > >> > > >> >In the log I get "Could not enable Rx Queue", might be the > >> >source of the issue? > >> > > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue > 0 > >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue > 1 > >> > > >> >When sending traffic single UDP src/dst/IP/MAC the setup > >> >reaches 14204188 pps 64 bytes, the error counter is also > >
[dpdk-dev] bnx2x driver and 57800 versus 57810
> >I have to practically identical systems, same hypervisor on each (Centos >7.x). In one, I have a 57800 card which works fine with DPDK with >SRIOV. In the other, I have a 57810 card which doesn't work with SRIOV. > >For the 57810 I have tracked this down to the status block in the VF >failing to be updated. The linux driver works fine but it appears to >use a slightly different scheme -- writing some sort of fastpath status >block generation per interrupt. > >Does anyone have any suggestions or a programming guide for this device? > > What is not working with 57810? Is it link related or traffic? Please provide the details. Attached is the SW programming guide for 577xx/578xx. I?m not sure if it has details pertaining to the specific issue that you have. Thanks, Harish FYI- I had replied to your email earlier with the doc attached but it did not go thru? yet due to size restrictions. Your mail to 'dev' with the subject Re: [dpdk-dev] bnx2x driver and 57800 versus 57810 Is being held until the list moderator can review it for approval. The reason it is being held: Message body is too big: 1350322 bytes with a limit of 300 KB This message and any attached documents contain information from the sending company or its parent company(s), subsidiaries, divisions or branch offices that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support
2016-01-27 11:46, Yuanhan Liu: > On Thu, Jan 21, 2016 at 12:49:10PM +0100, Thomas Monjalon wrote: > > 2016-01-19 16:12, Yuanhan Liu: > > > int > > > vtpci_init(struct rte_pci_device *dev, struct virtio_hw *hw) > > > { > > > - hw->vtpci_ops = &legacy_ops; > > > + hw->dev = dev; > > > + > > > + /* > > > +* Try if we can succeed reading virtio pci caps, which exists > > > +* only on modern pci device. If failed, we fallback to legacy > > > +* virtio handling. > > > +*/ > > > + if (virtio_read_caps(dev, hw) == 0) { > > > + PMD_INIT_LOG(INFO, "modern virtio pci detected."); > > > + hw->vtpci_ops = &modern_ops; > > > + hw->modern= 1; > > > + dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC; > > > + return 0; > > > + } > > > > RTE_PCI_DRV_INTR_LSC is already set by virtio_resource_init_by_uio(). > > We don't go that far here. Here we just detect if it's a modern virtio > device. And if yes, we do some modern initiations, and return. > > virtio_resource_init_by_uio() is invoked when virtio_read_caps() fails. > > > Do you mean interrupt was not supported with legacy virtio? > > Nope. this patch set changes nothing on legacy virtio support. Oh yes. I guess I had not seen the return.
[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request
Hello Jingjing, On Wed, Jan 27, 2016 at 2:49 AM, Jingjing Wu wrote: > Interrupt based request of PF reset from PF is supported by > enabling the adminq event process in VF driver. > Users can register a callback for this interrupt event to get > informed, when a PF reset request detected like: > rte_eth_dev_callback_register(portid, > RTE_ETH_EVENT_INTR_RESET, > reset_event_callback, > arg); > > Signed-off-by: Jingjing Wu Just adding my previous comment in this thread. Having this infrastructure is one thing, but the initial problem was that the driver did not recover from this reset event. The linux i40e vf driver handles this kind of event itself. Could we have something similar ? Thanks. -- David Marchand
[dpdk-dev] [PATCH] ethdev: fix byte order inconsistence between fdir flow and mask
Fixed issue of byte order in ethdev library that the structure for setting fdir's mask and flow entry is inconsist and made inputs of mask be in big endian. fixes: 76c6f89e80d4 ("ixgbe: support new flow director masks") 2d4c1a9ea2ac ("ethdev: add new flow director masks") Reported-by: Yaacov Hazan Signed-off-by: Jingjing Wu --- app/test-pmd/cmdline.c | 6 ++--- doc/guides/rel_notes/release_2_3.rst | 6 + drivers/net/ixgbe/ixgbe_fdir.c | 47 ++-- lib/librte_ether/rte_eth_ctrl.h | 7 -- 4 files changed, 43 insertions(+), 23 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 73298c9..13194c9 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -8687,13 +8687,13 @@ cmd_flow_director_mask_parsed(void *parsed_result, return; } - mask->vlan_tci_mask = res->vlan_mask; + mask->vlan_tci_mask = rte_cpu_to_be_16(res->vlan_mask); IPV4_ADDR_TO_UINT(res->ipv4_src, mask->ipv4_mask.src_ip); IPV4_ADDR_TO_UINT(res->ipv4_dst, mask->ipv4_mask.dst_ip); IPV6_ADDR_TO_ARRAY(res->ipv6_src, mask->ipv6_mask.src_ip); IPV6_ADDR_TO_ARRAY(res->ipv6_dst, mask->ipv6_mask.dst_ip); - mask->src_port_mask = res->port_src; - mask->dst_port_mask = res->port_dst; + mask->src_port_mask = rte_cpu_to_be_16(res->port_src); + mask->dst_port_mask = rte_cpu_to_be_16(res->port_dst); } cmd_reconfig_device_queue(res->port_id, 1, 1); diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..28d0f27 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -19,6 +19,10 @@ Drivers Libraries ~ +* ** fix byte order inconsistence between fdir flow and mask ** + + Fixed issue in ethdev library that the structure for setting + fdir's mask and flow entry is inconsist in byte order. Examples @@ -39,6 +43,8 @@ API Changes ABI Changes --- +* The fields in The ethdev structures ``rte_eth_fdir_masks`` were + changed to be in big endian. Shared Library Versions --- diff --git a/drivers/net/ixgbe/ixgbe_fdir.c b/drivers/net/ixgbe/ixgbe_fdir.c index e03219b..7423b2d 100644 --- a/drivers/net/ixgbe/ixgbe_fdir.c +++ b/drivers/net/ixgbe/ixgbe_fdir.c @@ -309,6 +309,7 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev, uint32_t fdiripv6m; /* IPv6 source and destination masks. */ uint16_t dst_ipv6m = 0; uint16_t src_ipv6m = 0; + volatile uint32_t *reg; PMD_INIT_FUNC_TRACE(); @@ -322,16 +323,16 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev, /* use the L4 protocol mask for raw IPv4/IPv6 traffic */ fdirm |= IXGBE_FDIRM_L4P; - if (input_mask->vlan_tci_mask == 0x0FFF) + if (input_mask->vlan_tci_mask == rte_cpu_to_be_16(0x0FFF)) /* mask VLAN Priority */ fdirm |= IXGBE_FDIRM_VLANP; - else if (input_mask->vlan_tci_mask == 0xE000) + else if (input_mask->vlan_tci_mask == rte_cpu_to_be_16(0xE000)) /* mask VLAN ID */ fdirm |= IXGBE_FDIRM_VLANID; else if (input_mask->vlan_tci_mask == 0) /* mask VLAN ID and Priority */ fdirm |= IXGBE_FDIRM_VLANID | IXGBE_FDIRM_VLANP; - else if (input_mask->vlan_tci_mask != 0xEFFF) { + else if (input_mask->vlan_tci_mask != rte_cpu_to_be_16(0xEFFF)) { PMD_INIT_LOG(ERR, "invalid vlan_tci_mask"); return -EINVAL; } @@ -340,19 +341,26 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev, IXGBE_WRITE_REG(hw, IXGBE_FDIRM, fdirm); /* store the TCP/UDP port masks, bit reversed from port layout */ - fdirtcpm = reverse_fdir_bitmasks(input_mask->dst_port_mask, -input_mask->src_port_mask); + fdirtcpm = reverse_fdir_bitmasks( + rte_be_to_cpu_16(input_mask->dst_port_mask), + rte_be_to_cpu_16(input_mask->src_port_mask)); - /* write all the same so that UDP, TCP and SCTP use the same mask */ + /* write all the same so that UDP, TCP and SCTP use the same mask +* (little-endian) + */ IXGBE_WRITE_REG(hw, IXGBE_FDIRTCPM, ~fdirtcpm); IXGBE_WRITE_REG(hw, IXGBE_FDIRUDPM, ~fdirtcpm); IXGBE_WRITE_REG(hw, IXGBE_FDIRSCTPM, ~fdirtcpm); info->mask.src_port_mask = input_mask->src_port_mask; info->mask.dst_port_mask = input_mask->dst_port_mask; - /* Store source and destination IPv4 masks (big-endian) */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRSIP4M, ~(input_mask->ipv4_mask.src_ip)); - IXGBE_WRITE_REG(hw, IXGBE_FDIRDIP4M, ~(input_mask->ipv4_mask.dst_ip)); + /* Store source an
[dpdk-dev] [PATCH v6 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue
v6 changes: reflect the changes in release notes and library version map file revise our duff's code style a bit to make it more readable v5 changes: add comment about duff's device and our variant implementation v4 changes: fix a silly typo in error handling when rte_pktmbuf_alloc fails v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop in rte_pktmbuf_alloc_bulk to help the performance For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios their mbufs are all simple mbufs, i.e meet the following requirements: * no multiple segments * not indirect mbuf * refcnt is 1 * belong to the same mbuf memory pool, it could directly call rte_mempool_put to free the bulk of mbufs, otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free the mbuf one by one. This patchset will not provide this symmetric implementation. Huawei Xie (2): mbuf: provide rte_pktmbuf_alloc_bulk API vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue doc/guides/rel_notes/release_2_3.rst | 3 ++ lib/librte_mbuf/rte_mbuf.h | 55 lib/librte_mbuf/rte_mbuf_version.map | 7 + lib/librte_vhost/vhost_rxtx.c| 35 ++- 4 files changed, 87 insertions(+), 13 deletions(-) -- 1.8.1.4
[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
v6 changes: reflect the changes in release notes and library version map file revise our duff's code style a bit to make it more readable v5 changes: add comment about duff's device and our variant implementation v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop a bit to help the performance rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs. There is related thread about this bulk API. http://dpdk.org/dev/patchwork/patch/4718/ Thanks to Konstantin's loop unrolling. Attached the wiki page about duff's device. It explains the performance optimization through loop unwinding, and also the most dramatic use of case label fall-through. https://en.wikipedia.org/wiki/Duff%27s_device In our implementation, we use while() loop rather than do{} while() loop because we could not assume count is strictly positive. Using while() loop saves one line of check if count is zero. Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev --- doc/guides/rel_notes/release_2_3.rst | 3 ++ lib/librte_mbuf/rte_mbuf.h | 55 lib/librte_mbuf/rte_mbuf_version.map | 7 + 3 files changed, 65 insertions(+) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..a52cba3 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,9 @@ DPDK Release 2.3 New Features +* **Enable bulk allocation of mbufs. ** + A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the user + to allocate a bulk of mbufs. Resolved Issues --- diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index f234ac9..b2ed479 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -1336,6 +1336,61 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp) } /** + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default + * values. + * + * @param pool + *The mempool from which mbufs are allocated. + * @param mbufs + *Array of pointers to mbufs + * @param count + *Array size + * @return + * - 0: Success + */ +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool, +struct rte_mbuf **mbufs, unsigned count) +{ + unsigned idx = 0; + int rc; + + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); + if (unlikely(rc)) + return rc; + + /* To understand duff's device on loop unwinding optimization, see +* https://en.wikipedia.org/wiki/Duff's_device. +* Here while() loop is used rather than do() while{} to avoid extra +* check if count is zero. +*/ + switch (count % 4) { + case 0: + while (idx != count) { + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 3: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 2: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 1: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + } + } + return 0; +} + +/** * Attach packet mbuf to another packet mbuf. * * After attachment we refer the mbuf we attached as 'indirect', diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map index e10f6bd..257c65a 100644 --- a/lib/librte_mbuf/rte_mbuf_version.map +++ b/lib/librte_mbuf/rte_mbuf_version.map @@ -18,3 +18,10 @@ DPDK_2.1 { rte_pktmbuf_pool_create; } DPDK_2.0; + +DPDK_2.3 { + global: + + rte_pktmbuf_alloc_bulk; + +} DPDK_2.1; -- 1.8.1.4
[dpdk-dev] [PATCH v6 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue
v4 changes: fix a silly typo in error handling when rte_pktmbuf_alloc fails reported by haifeng pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev Acked-by: Yuanhan Liu Tested-by: Yuanhan Liu --- lib/librte_vhost/vhost_rxtx.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..f10d534 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t i; uint16_t free_entries, entry_success = 0; uint16_t avail_idx; + uint8_t alloc_err = 0; + uint8_t seg_num; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", dev->device_fh, free_entries); + + if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool, + pkts, free_entries)) < 0) { + RTE_LOG(ERR, VHOST_DATA, + "Failed to bulk allocating %d mbufs\n", free_entries); + return 0; + } + /* Retrieve all of the head indexes first to avoid caching issues. */ for (i = 0; i < free_entries; i++) head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; @@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t vb_avail, vb_offset; uint32_t seg_avail, seg_offset; uint32_t cpy_len; - uint32_t seg_num = 0; + seg_num = 0; struct rte_mbuf *cur; - uint8_t alloc_err = 0; + desc = &vq->desc[head[entry_success]]; @@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, vq->used->ring[used_idx].id = head[entry_success]; vq->used->ring[used_idx].len = 0; - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); - break; - } + prev = cur = m = pkts[entry_success]; seg_offset = 0; seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; cpy_len = RTE_MIN(vb_avail, seg_avail); @@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); seg_num++; - cur = m; - prev = m; while (cpy_len != 0) { rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), (void *)((uintptr_t)(vb_addr + vb_offset)), @@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, cpy_len = RTE_MIN(vb_avail, seg_avail); } - if (unlikely(alloc_err == 1)) + if (unlikely(alloc_err)) break; m->nb_segs = seg_num; - pkts[entry_success] = m; vq->last_used_idx++; entry_success++; } + if (unlikely(alloc_err)) { + uint16_t i = entry_success; + + m->nb_segs = seg_num; + for (; i < free_entries; i++) + rte_pktmbuf_free(pkts[i]); + } + rte_compiler_barrier(); vq->used->idx += entry_success; /* Kick guest if required. */ -- 1.8.1.4
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
Hi Anatoly, Few small comments. The comments "function pointer typedef" or "structure to hold" don't bring new information. Please keep it short. 2016-01-13 12:36, Anatoly Burakov: > +/* function pointer typedef for DMA mapping functions */ -> DMA mapping function type It would be relevant to describe the return and the parameter. > +typedef int (*vfio_dma_func_t)(int); > + > +/* Structure to hold supported IOMMU types */ This comment seems useless. > +struct vfio_iommu_type { [...] > +/* function prototypes for different IOMMU types */ idem > +int vfio_iommu_type1_dma_map(int container_fd); > +int vfio_iommu_noiommu_dma_map(int container_fd); > + > +/* IOMMU types we support */ > +static const struct vfio_iommu_type iommu_types[] = { > + /* x86 IOMMU, otherwise known as type 1 */ > + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, > + /* IOMMU-less mode */ > + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, > +}; [...] > --- /dev/null > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c Why a new file for these functions?
[dpdk-dev] [PATCH] ethdev: fix byte order inconsistence between fdir flow and mask
2016-01-27 16:37, Jingjing Wu: > Fixed issue of byte order in ethdev library that the structure > for setting fdir's mask and flow entry is inconsist and made > inputs of mask be in big endian. Please be more precise. Which one is big endian? Wasn't it tested before? > fixes: 76c6f89e80d4 ("ixgbe: support new flow director masks") >2d4c1a9ea2ac ("ethdev: add new flow director masks") Please put Fixes: on the two lines. > --- a/doc/guides/rel_notes/release_2_3.rst > +++ b/doc/guides/rel_notes/release_2_3.rst > @@ -19,6 +19,10 @@ Drivers > Libraries > ~ > > +* ** fix byte order inconsistence between fdir flow and mask ** > + > + Fixed issue in ethdev library that the structure for setting > + fdir's mask and flow entry is inconsist in byte order. John, comment on release notes formatting? It's important to have the first items well formatted. > @@ -39,6 +43,8 @@ API Changes > ABI Changes > --- > > +* The fields in The ethdev structures ``rte_eth_fdir_masks`` were > + changed to be in big endian. Please take care of uppercase typo here. > - /* write all the same so that UDP, TCP and SCTP use the same mask */ > + /* write all the same so that UDP, TCP and SCTP use the same mask > + * (little-endian) > + */ Spacing typo here. Sorry for the nits ;) > - uint8_t mac_addr_byte_mask; /** Per byte MAC address mask */ > + uint8_t mac_addr_byte_mask; /** Bit mask for associated byte */ > uint32_t tunnel_id_mask; /** tunnel ID mask */ > - uint8_t tunnel_type_mask; > + uint8_t tunnel_type_mask; /**< 1 - Match tunnel type, > +0 - Ignore tunnel type. */ These changes seem unrelated with the patch. It's good to improve doc of this API but it's maybe not enough. Example: uint8_t mac_addr_byte_mask; /** Bit mask for associated byte */ Are we sure everybody understand how to fill it?
[dpdk-dev] [PATCH v2] fix checkpatch errors
v2 changes: add missed commit message in v1 fix the error reported by checkpatch: "ERROR: return is not a function, parentheses are not required" also removed other extra parentheses like: "return val == 0" "return (rte_mempool_lookup(...))" Signed-off-by: Huawei Xie --- app/test-pmd/cmdline.c | 12 ++-- app/test-pmd/config.c | 2 +- app/test-pmd/flowgen.c | 2 +- app/test-pmd/mempool_anon.c| 12 ++-- app/test-pmd/testpmd.h | 2 +- app/test-pmd/txonly.c | 2 +- app/test/test_mbuf.c | 12 ++-- app/test/test_memcpy_perf.c| 4 +- app/test/test_mempool.c| 4 +- app/test/test_memzone.c| 24 +++ app/test/test_red.c| 42 ++-- app/test/test_ring.c | 4 +- drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c | 2 +- drivers/crypto/qat/qat_qp.c| 22 +++--- drivers/net/bnx2x/bnx2x.c | 34 - drivers/net/bnx2x/bnx2x.h | 4 +- drivers/net/bnx2x/bnx2x_rxtx.c | 16 ++--- drivers/net/bnx2x/debug.c | 6 +- drivers/net/bonding/rte_eth_bond_pmd.c | 2 +- drivers/net/e1000/em_ethdev.c | 40 +-- drivers/net/e1000/em_rxtx.c| 46 ++--- drivers/net/e1000/igb_ethdev.c | 18 ++--- drivers/net/e1000/igb_rxtx.c | 30 drivers/net/fm10k/fm10k_ethdev.c | 40 +-- drivers/net/i40e/i40e_ethdev.c | 2 +- drivers/net/i40e/i40e_ethdev.h | 2 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/i40e/i40e_rxtx.c | 14 ++-- drivers/net/ixgbe/ixgbe_82599_bypass.c | 4 +- drivers/net/ixgbe/ixgbe_bypass.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 34 - drivers/net/ixgbe/ixgbe_rxtx.c | 36 +- drivers/net/mlx5/mlx5_utils.h | 2 +- drivers/net/mpipe/mpipe_tilegx.c | 4 +- drivers/net/nfp/nfp_net.c | 16 ++--- drivers/net/virtio/virtio_ethdev.c | 6 +- examples/ip_pipeline/cpu_core_map.c| 2 +- .../pipeline/pipeline_flow_actions_be.c| 2 +- examples/ip_reassembly/main.c | 22 +++--- examples/ipv4_multicast/main.c | 14 ++-- examples/l3fwd/main.c | 4 +- examples/multi_process/symmetric_mp/main.c | 2 +- examples/netmap_compat/bridge/bridge.c | 8 +-- examples/netmap_compat/lib/compat_netmap.c | 80 +++--- examples/qos_sched/args.c | 2 +- examples/quota_watermark/qw/main.h | 2 +- examples/vhost/main.c | 4 +- examples/vhost_xen/main.c | 2 +- examples/vhost_xen/vhost_monitor.c | 6 +- lib/librte_acl/acl_run_neon.h | 2 +- lib/librte_cryptodev/rte_cryptodev.c | 22 +++--- lib/librte_eal/common/eal_common_memzone.c | 2 +- .../common/include/arch/ppc_64/rte_byteorder.h | 2 +- lib/librte_eal/common/malloc_heap.c| 2 +- lib/librte_eal/linuxapp/eal/eal_xen_memory.c | 2 +- lib/librte_eal/linuxapp/kni/kni_vhost.c| 2 +- lib/librte_ether/rte_ether.h | 10 +-- lib/librte_hash/rte_cuckoo_hash.c | 18 ++--- lib/librte_ip_frag/ip_frag_internal.c | 4 +- lib/librte_lpm/rte_lpm.c | 2 +- lib/librte_mempool/rte_mempool.h | 2 +- lib/librte_ring/rte_ring.h | 6 +- lib/librte_sched/rte_bitmap.h | 6 +- lib/librte_sched/rte_red.h | 2 +- lib/librte_sched/rte_sched.c | 4 +- 65 files changed, 372 insertions(+), 372 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 73298c9..a82682d 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -2418,11 +2418,11 @@ parse_item_list(char* str, const char* item_name, unsigned int max_items, } if (c != ',') { printf("character %c is not a decimal digit\n", c); - return (0); + return 0; } if (! value_ok) { printf("No valid value before comma\n"); - return (0); + return 0;
[dpdk-dev] [PATCH] log: add missing symbol
2015-12-16 16:38, Stephen Hemminger: > rte_get_log_type and rte_get_log_level functions has been avaliable > for many versions. But they are missing from the shared library map > and therefore do not get exported correctly. > > Signed-off-by: Stephen Hemminger > --- > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++ > 1 file changed, 2 insertions(+) Why only in linuxapp? > diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > index cbe175f..51a241c 100644 > --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > @@ -93,7 +93,9 @@ DPDK_2.0 { > rte_realloc; > rte_set_application_usage_hook; > rte_set_log_level; > + rte_get_log_level; > rte_set_log_type; > + rte_get_log_type; We try to keep an alphabetical order :)
[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment
On 1/26/2016 10:58 AM, Tetsuya Mukawa wrote: > On 2016/01/25 19:15, Xie, Huawei wrote: >> On 1/22/2016 6:38 PM, Tetsuya Mukawa wrote: >>> On 2016/01/22 17:14, Xie, Huawei wrote: On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote: > virtio: Extend virtio-net PMD to support container environment > > The patch adds a new virtio-net PMD configuration that allows the PMD to > work on host as if the PMD is in VM. > Here is new configuration for virtio-net PMD. > - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE > To use this mode, EAL needs physically contiguous memory. To allocate > such memory, add "--shm" option to application command line. > > To prepare virtio-net device on host, the users need to invoke QEMU > process in special qtest mode. This mode is mainly used for testing QEMU > devices from outer process. In this mode, no guest runs. > Here is QEMU command line. > > $ qemu-system-x86_64 \ > -machine pc-i440fx-1.4,accel=qtest \ > -display none -qtest-log /dev/null \ > -qtest unix:/tmp/socket,server \ > -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1\ > -device virtio-net-pci,netdev=net0,mq=on \ > -chardev socket,id=chr1,path=/tmp/ivshmem,server \ > -device ivshmem,size=1G,chardev=chr1,vectors=1 > > * QEMU process is needed per port. Does qtest supports hot plug virtio-net pci device, so that we could run one QEMU process in host, which provisions the virtio-net virtual devices for the container? >>> Theoretically, we can use hot plug in some cases. >>> But I guess we have 3 concerns here. >>> >>> 1. Security. >>> If we share QEMU process between multiple DPDK applications, this QEMU >>> process will have all fds of the applications on different containers. >>> In some cases, it will be security concern. >>> So, I guess we need to support current 1:1 configuration at least. >>> >>> 2. shared memory. >>> Currently, QEMU and DPDK application will map shared memory using same >>> virtual address. >>> So if multiple DPDK application connects to one QEMU process, each DPDK >>> application should have different address for shared memory. I guess >>> this will be a big limitation. >>> >>> 3. PCI bridge. >>> So far, QEMU has one PCI bridge, so we can connect almost 10 PCI devices >>> to QEMU. >>> (I forget correct number, but it's almost 10, because some slots are >>> reserved by QEMU) >>> A DPDK application needs both virtio-net and ivshmem device, so I guess >>> almost 5 DPDK applications can connect to one QEMU process, so far. >>> To add more PCI bridges solves this. >>> But we need to add a lot of implementation to support cascaded PCI >>> bridges and PCI devices. >>> (Also we need to solve above "2nd" concern.) >>> >>> Anyway, if we use virtio-net PMD and vhost-user PMD, QEMU process will >>> not do anything after initialization. >>> (QEMU will try to read a qtest socket, then be stopped because there is >>> no message after initialization) >>> So I guess we can ignore overhead of these QEMU processes. >>> If someone cannot ignore it, I guess this is the one of cases that it's >>> nice to use your light weight container implementation. >> Thanks for the explanation, and also in your opinion where is the best >> place to run the QEMU instance? If we run QEMU instances in host, for >> vhost-kernel support, we could get rid of the root privilege issue. > Do you mean below? > If we deploy QEMU instance on host, we can start a container without the > root privilege. > (But on host, still QEMU instance needs the privilege to access to > vhost-kernel) There is no issue running QEMU instance with root privilege on host, but i think it is not acceptable granting the container root privilege. > > If so, I agree to deploy QEMU instance on host or other privileged > container will be nice. > In the case of vhost-user, to deploy on host or non-privileged container > will be good. > >> Another issue is do you plan to support multiple virtio devices in >> container? Currently i find the code assuming only one virtio-net device >> in QEMU, right? > Yes, so far, 1 port needs 1 QEMU instance. > So if you need multiple virtio devices, you need to invoke multiple QEMU > instances. > > Do you want to deploy 1 QEMU instance for each DPDK application, even if > the application has multiple virtio-net ports? > > So far, I am not sure whether we need it, because this type of DPDK > application will need only one port in most cases. > But if you need this, yes, I can implement using QEMU PCI hotplug feature. > (But probably we can only attach almost 10 ports. This will be limitation.) I am OK with supporting one virtio device for the first version. > >> Btw, i have read most of your qtest code. No obvious issues found so far >> but quite a couple of nits. You must have spent a lot of time on this. >> It is great work! > I appreciat
[dpdk-dev] [PATCH v2 1/2] ethdev: remove useless null checks
On Tue, Jan 26, 2016 at 4:50 PM, Jan Viktorin wrote: > What about the RTE_VERIFY? I think, it's more appropriate here. Well, here, I am removing useless checks in static functions. But for the rest of ethdev api, I agree we could add some RTE_VERIFY. > Otherwise, feel free to add: > > Reviewed-by: Jan Viktorin Thanks. -- David Marchand
[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment
On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote: > + /* Set BAR region */ > + for (i = 0; i < NB_BAR; i++) { > + switch (dev->bar[i].type) { > + case QTEST_PCI_BAR_IO: > + case QTEST_PCI_BAR_MEMORY_UNDER_1MB: > + case QTEST_PCI_BAR_MEMORY_32: > + qtest_pci_outl(s, bus, device, 0, dev->bar[i].addr, > + dev->bar[i].region_start); > + PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - > 0x%lx\n", > + dev->name, dev->bar[i].region_start, > + dev->bar[i].region_start + > dev->bar[i].region_size); > + break; > + case QTEST_PCI_BAR_MEMORY_64: > + qtest_pci_outq(s, bus, device, 0, dev->bar[i].addr, > + dev->bar[i].region_start); > + PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - > 0x%lx\n", > + dev->name, dev->bar[i].region_start, > + dev->bar[i].region_start + > dev->bar[i].region_size); > + break; Hasn't the bar resource already been allocated? Is it the app's responsibility to allocate the bar resource in qtest mode? The app couldn't have that knowledge. > + case QTEST_PCI_BAR_DISABLE: > + break; > + } > + } > +
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
Hi Thomas, > The comments "function pointer typedef" or "structure to hold" don't > bring new information. Please keep it short. I'll fix that and submit a v3, thanks. > Why a new file for these functions? Well, my thought was to make future extensions easier by way of avoiding mixing irrelevant and/or general code with driver-specific code. I can change it back if that's not OK. Thanks, Anatoly
[dpdk-dev] [PATCH] ip_pipeline: add load balancing function to pass-through pipeline
The passthrough pipeline implementation is extended with load balancing function. This function allows uniform distribution of the packets among its output ports. For packets distribution, any application level logic can be applied. For instance, in this implementation, hash value computed over specific header fields of the incoming packets has been used to spread traffic uniformly among the output ports. The following passthrough configuration can be used for implementing load balancing function over ipv4 traffic; [PIPELINE0] type = PASS-THROUGH core = 0 pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0 pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0 dma_src_offset = 278; mbuf (128) + headroom (128) + 1st ethertype offset (14) + ttl offset within ip header = 278 (ipv4) dma_dst_offset = 128; mbuf (128) dma_size = 16 dma_src_mask = 00FF dma_hash_offset = 144; (dma_dst_offset+dma_size) lb = hash Signed-off-by: Jasvinder Singh Acked-by: Cristian Dumitrescu --- .../ip_pipeline/pipeline/pipeline_actions_common.h | 22 ++ .../ip_pipeline/pipeline/pipeline_passthrough_be.c | 281 - .../ip_pipeline/pipeline/pipeline_passthrough_be.h | 2 + 3 files changed, 245 insertions(+), 60 deletions(-) diff --git a/examples/ip_pipeline/pipeline/pipeline_actions_common.h b/examples/ip_pipeline/pipeline/pipeline_actions_common.h index 9958758..2c08db2 100644 --- a/examples/ip_pipeline/pipeline/pipeline_actions_common.h +++ b/examples/ip_pipeline/pipeline/pipeline_actions_common.h @@ -59,6 +59,28 @@ f_ah( \ return 0; \ } +#define PIPELINE_PORT_IN_AH_LB(f_ah, f_pkt_work, f_pkt4_work) \ +static int \ +f_ah( \ + struct rte_pipeline *p, \ + struct rte_mbuf **pkts, \ + uint32_t n_pkts,\ + void *arg) \ +{ \ + uint32_t i; \ + \ + uint64_t pkt_mask = RTE_LEN2MASK(n_pkts, uint64_t); \ + \ + rte_pipeline_ah_packet_hijack(p, pkt_mask); \ + for (i = 0; i < (n_pkts & (~0x3LLU)); i += 4) \ + f_pkt4_work(&pkts[i], arg); \ + \ + for ( ; i < n_pkts; i++)\ + f_pkt_work(pkts[i], arg); \ + \ + return 0; \ +} + #define PIPELINE_TABLE_AH_HIT(f_ah, f_pkt_work, f_pkt4_work) \ static int \ f_ah( \ diff --git a/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c b/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c index 7642462..75b6fd8 100644 --- a/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c +++ b/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c @@ -72,7 +72,9 @@ pkt_work( struct rte_mbuf *pkt, void *arg, uint32_t dma_size, - uint32_t hash_enabled) + uint32_t hash_enabled, + uint32_t lb_hash, + uint32_t port_out_pw2) { struct pipeline_passthrough *p = arg; @@ -90,8 +92,24 @@ pkt_work( dma_dst[i] = dma_src[i] & dma_mask[i]; /* Read (dma_dst), compute (hash), write (hash) */ - if (hash_enabled) - *dma_hash = p->f_hash(dma_dst, dma_size, 0); + if (hash_enabled) { + uint32_t hash = p->f_hash(dma_dst, dma_size, 0); + *dma_hash = hash; + + if (lb_hash) { + uint32_t port_out; + + if (port_out_pw2) + port_out + = hash & (p->p.n_ports_out - 1); + else + port_out + = hash % p->p.n_ports_out; + + rte_pipeline_port_out_packet_insert(p->p.p, + port_out, pkt); + } + } } static inline __attribute__((always_inline)) void @@ -99,7 +117,9 @@ pkt4_work( struct rte_mbuf **pkts, void *arg, uint32_t dma_size, - uint32_t hash_enabled) + uint32_t hash_enabled, + uint32_t lb_hash, + ui
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
2016-01-27 10:08, Burakov, Anatoly: > > Why a new file for these functions? > > Well, my thought was to make future extensions easier by way of avoiding > mixing irrelevant and/or general code with driver-specific code. I can change > it back if that's not OK. No strong opinion here. David?
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
On Wed, Jan 27, 2016 at 11:12 AM, Thomas Monjalon wrote: > 2016-01-27 10:08, Burakov, Anatoly: >> > Why a new file for these functions? >> >> Well, my thought was to make future extensions easier by way of avoiding >> mixing irrelevant and/or general code with driver-specific code. I can >> change it back if that's not OK. > > No strong opinion here. > David? Hum, no strong opinion either, but I don't think we really need to split this file for this much code. Besides, if we keep all code in eal_pci_vfio.c, there is no need to expose those structures through eal_pci_init.h. -- David Marchand
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
> >> > Why a new file for these functions? > >> > >> Well, my thought was to make future extensions easier by way of > avoiding mixing irrelevant and/or general code with driver-specific code. I > can > change it back if that's not OK. > > > > No strong opinion here. > > David? > > Hum, no strong opinion either, but I don't think we really need to split this > file for this much code. > Besides, if we keep all code in eal_pci_vfio.c, there is no need to expose > those structures through eal_pci_init.h. OK then, I'll merge it back into the eal_pci_vfio.c Thanks, Anatoly
[dpdk-dev] [PATCH v2] eal: add function to check if primary proc alive
This patch adds a new function to the EAL API: int rte_eal_primary_proc_alive(const char *path); The function indicates if a primary process is alive right now. This functionality is implemented by testing for a write- lock on the config file, and the function tests for a lock. The use case for this functionality is that a secondary process can wait until a primary process starts by polling the function and waiting. When the primary is running, the secondary continues to poll to detect if the primary process has quit unexpectedly, the secondary process can detect this. The RTE_MAGIC number is written to the shared config by the primary process, this is the signal to the secondary process that the EAL is set up, and ready to be used. The function rte_eal_mcfg_complete() writes RTE_MAGIC. This has been delayed in the EAL init proceedure, as the PCI probing in the primary process can interfere with the secondary running. Signed-off-by: Harry van Haaren --- v2: - Passing NULL as const char* uses default /var/run/.rte_config - Moved code into /common/ instead of /linuxapp/, should work on BSD now doc/guides/rel_notes/release_2_3.rst| 7 +++ lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/bsdapp/eal/rte_eal_version.map | 8 lib/librte_eal/common/eal_common_proc.c | 61 + lib/librte_eal/common/include/rte_eal.h | 18 lib/librte_eal/linuxapp/eal/Makefile| 1 + lib/librte_eal/linuxapp/eal/eal.c | 4 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 +++ 8 files changed, 105 insertions(+), 2 deletions(-) create mode 100644 lib/librte_eal/common/eal_common_proc.c diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..14b5b06 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -11,6 +11,13 @@ Resolved Issues EAL ~~~ +* **Added rte_eal_primary_proc_alive() function** + + A new function ``rte_eal_primary_proc_alive()`` has been added + to allow the user to detect if a primary process is running. + Use cases for this feature include fault detection, and monitoring + using secondary processes. + Drivers ~~~ diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index 65b293f..2d6e3b1 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -61,6 +61,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_alarm.c # from common dir SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_lcore.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_proc.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_timer.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_memzone.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_log.c diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 9d7adf1..0e28017 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -135,3 +135,11 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + + +DPDK_2.3 { + global: + + rte_eal_primary_proc_alive; + +} DPDK_2.2; diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c new file mode 100644 index 000..c598891 --- /dev/null +++ b/lib/librte_eal/common/eal_common_proc.c @@ -0,0 +1,61 @@ +/*- + * BSD LICENSE + * + * Copyright 2016 Intel Shannon Ltd. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIA
[dpdk-dev] [PATCH] eal: add function to check if primary proc alive
> From: Richardson, Bruce > > Agreed, however hiding it totally removes the flexibility of waiting for a > > primary > > that is starting with --file-prefix (aka: in a non-default location). > > Imposing > > a limit on only monitoring primary procs in the default location seems > > wrong. > > But the secondary also needs the same prefix. Is that prefix not accessible by > this function to be used? The issue is that the EAL parsing code is performed during rte_init(), which is exactly what this function tries to avoid - initializing EAL before a primary process starts. I looked at changing the EAL parsing to come before rte_init(), and considered adding a minimal parser for --file-prefix. Both routes seem a bad solution, either for complexity or code-duplication. v2 of this patch posted to list: http://dpdk.org/dev/patchwork/patch/10126/ -Harry
[dpdk-dev] [PATCH v5 10/11] virtio: pci: add dummy func definition for in/outb for non-x86 arch
Ping? On Tue, Jan 19, 2016 at 5:16 PM, Santosh Shukla wrote: > For non-x86 arch, Compiler will throw build error for in/out apis. Including > dummy api function so to pass build. > > Note that: For virtio to work for non-x86 arch - RTE_EAL_VFIO is the only > supported method. RTE_EAL_IGB_UIO is not supported for non-x86 arch. > > So, Virtio support for arch and supported interface by that arch: > > ARCH IGB_UIO VFIO > x86 Y Y > ARM64 N/A Y > PPC_64 N/A Y (Not tested but likely should work, as vfio is > arch independent) > > Note: Applicable for virtio spec 0.95 > > Signed-off-by: Santosh Shukla > --- > drivers/net/virtio/virtio_pci.h | 46 > +++ > 1 file changed, 46 insertions(+) > > diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h > index f550d22..b88f9ec 100644 > --- a/drivers/net/virtio/virtio_pci.h > +++ b/drivers/net/virtio/virtio_pci.h > @@ -46,6 +46,7 @@ > #endif > > #include > +#include "virtio_logs.h" > > struct virtqueue; > > @@ -320,6 +321,51 @@ outl_p(unsigned int data, unsigned int port) > } > #endif > > +#if !defined(RTE_ARCH_X86_64) && !defined(RTE_ARCH_I686) && \ > + defined(RTE_EXEC_ENV_LINUXAPP) > +static inline uint8_t > +inb(unsigned long addr __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "inb() not supported for this RTE_ARCH\n"); > + return 0; > +} > + > +static inline uint16_t > +inw(unsigned long addr __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "inw() not supported for this RTE_ARCH\n"); > + return 0; > +} > + > +static inline uint32_t > +inl(unsigned long addr __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "in() not supported for this RTE_ARCH\n"); > + return 0; > +} > + > +static inline void > +outb_p(unsigned char data __rte_unused, unsigned int port __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "outb_p() not supported for this RTE_ARCH\n"); > + return; > +} > + > +static inline void > +outw_p(unsigned short data __rte_unused, unsigned int port __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "outw_p() not supported for this RTE_ARCH\n"); > + return; > +} > + > +static inline void > +outl_p(unsigned int data __rte_unused, unsigned int port __rte_unused) > +{ > + PMD_INIT_LOG(ERR, "outl_p() not supported for this RTE_ARCH\n"); > + return; > +} > +#endif > + > static inline int > vtpci_with_feature(struct virtio_hw *hw, uint64_t bit) > { > -- > 1.7.9.5 >
[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
On Tue, Jan 26, 2016 at 9:51 PM, Santosh Shukla wrote: > On Tue, Jan 26, 2016 at 7:58 PM, Thomas Monjalon > wrote: >> 2016-01-26 19:35, Santosh Shukla: >>> On Tue, Jan 26, 2016 at 6:30 PM, Thomas Monjalon >>> wrote: >>> > 2016-01-26 15:56, Santosh Shukla: >>> >> In my observation, currently virtio work for vfio-noiommu, that's why >>> >> said drv->kdrv need to know vfio mode. >>> > >>> > It is your observation. It may change in near future. >>> >>> so that mean till then, virtio support for non-x86 arch has to wait? >> >> No, absolutely not. virtio for non-x86 is welcome. >> >>> We have working model with vfio-noiommu, don't you think it make sense >>> to let vfio_noiommu implementation exist and later in-case >>> virtio+iommu gets mainline then switch to vfio __mode__ agnostic >>> approach. And for that All it takes to replace __noiommu suffix with >>> default. >> >> I'm just saying you should not touch the enum rte_kernel_driver. >> RTE_KDRV_VFIO is a driver. >> RTE_KDRV_VFIO_NOIOMMU is a mode. >> As the VFIO API is the same in both modes, there is no reason to >> distinguish them at this level. >> Your patch adds the NOIOMMU case everywhere: >> case RTE_KDRV_VFIO: >> + case RTE_KDRV_VFIO_NOIOMMU: >> >> I'll stop commenting here to let others give their opinion. >> >> [...] >>> >> with vfio+iommu; binding virtio pci device to vfio-pci driver fail; >>> >> giving below error: >>> >> [ 53.053464] VFIO - User Level meta-driver version: 0.3 >>> >> [ 73.077805] vfio-pci: probe of :00:03.0 failed with error -22 >>> >> [ 73.077852] vfio-pci: probe of :00:03.0 failed with error -22 >>> >> >>> >> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get() >>> >> fails: iommu doesn't have group for virtio pci device. >>> > >>> > Yes it fails when binding. >>> > So the later check in the virtio PMD is useless. >>> >>> Which check? >> >> The check for VFIO noiommu only: >> - if (dev->kdrv == RTE_KDRV_VFIO) >> + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) >> >> [...] >>> > Furthermore restricting virtio to no-iommu mode doesn't bring >>> > any improvement. >>> >>> We're not __restricting__, as soon as virtio+iommu gets working state, >>> we'll simply replace __noiommu with default. Then its upto user to try >>> out virtio with vfio default or vfio_noiommu. >> >> Yes it's up to user. >> So your code should be >> if (dev->kdrv == RTE_KDRV_VFIO) >> > > Right, > >>> > That's why I suggest to keep the initial semantic of kdrv and >>> > not pollute it with VFIO modes. >>> >>> I am okay to live with default and forget suffix __noiommu but there >>> are implementation problem which was discussed in other thread >>> - Virtio pmd driver should avoid interface parsing i.e. >>> virtio_resource_init_uio/vfio() etc.. For vfio case - We could easily >>> get rid of by moving /sys parsing to pci_eal layer, Right? If so then >>> virtio currently works with vfio-noiommu, it make sense to me that >>> pci_eal layer does parsing for pmd driver before that pmd driver get >>> initialized. >> >> Please reword. What is the problem? >> >>> - Another case could be: iommu-less-pmd-driver. eal layer to do >>> parsing before updating drv->kdrv. >> >> [...] >>> >> >> > If a check is needed, I would prefer using your function >>> >> >> > pci_vfio_is_noiommu() and remove driver modes from struct >>> >> >> > rte_kernel_driver. >>> >> >> >>> >> >> I don't think calling pci_vfio_no_iommu() inside >>> >> >> virtio_reg_rd/wr_1/2/3() would be a good idea. >>> >> > >>> >> > Why? The value may be cached in the priv properties. >>> >> > >>> >> pci_vfio_is_noiommu() parses /sys for >>> >> - enable_noiommu param >>> >> - attached driver name is vfio-noiommu or not. >>> >> >>> >> It does file operation for that, I meant to say that calling this api >>> >> within register_rd/wr function is not correct. It would be better if >>> >> those low level register_rd/wr api only checks driver_types. >>> > >>> > Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached >>> > to keep efficiency. >>> >>> I am not convinced though, Still find pmd driver checking driver_types >>> using drv->kdrv is better approach than introducing a new global >>> variable which may look something like; >> >> Not a global variable. A function in EAL layer. A variable in PMD priv. >> > > If we agreed to use condition (drv->kdrv == RTE_KDRV_VFIO); > then resource parsing for vfio {including vfio and vfio_noiommu both > case} is enforced in virtio pmd driver layer and that is contradicting > to what we agreed earlier in this[1] thread. Also we don't need a > function in EAL layer or a variable in PMD priv. Perhaps a private > function in virtio pmd which does parsing for vfio interface. > > Thoughts? > > [1] http://dpdk.org/dev/patchwork/patch/9862/ > Any comment/feedback on above approach? >>> At pci_eal layer >>> bool vfio_mode; >>> vfio_mode = pci_vfio_is_noiommu(); >>> >>> At virtio pmd driver layer >>> Checking v
[dpdk-dev] [PATCH v2] ip_pipeline: fix cpu socket-id error
This patch fixes the socket-id error in ip_pipeline sample application running over uni-processor systems. Signed-off-by: Jasvinder Singh Acked-by: Cristian Dumitrescu --- v2: - used SOCKET_ID_ANY instead of -1 examples/ip_pipeline/init.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index 186ca03..c4601c9 100644 --- a/examples/ip_pipeline/init.c +++ b/examples/ip_pipeline/init.c @@ -835,6 +835,14 @@ app_init_link_frag_ras(struct app_params *app) } } +static inline int +app_get_cpu_socket_id(uint32_t pmd_id) +{ + int status = rte_eth_dev_socket_id(pmd_id); + + return (status != SOCKET_ID_ANY) ? status : 0; +} + static void app_init_link(struct app_params *app) { @@ -890,7 +898,7 @@ app_init_link(struct app_params *app) p_link->pmd_id, rxq_queue_id, p_rxq->size, - rte_eth_dev_socket_id(p_link->pmd_id), + app_get_cpu_socket_id(p_link->pmd_id), &p_rxq->conf, app->mempool[p_rxq->mempool_id]); if (status < 0) @@ -917,7 +925,7 @@ app_init_link(struct app_params *app) p_link->pmd_id, txq_queue_id, p_txq->size, - rte_eth_dev_socket_id(p_link->pmd_id), + app_get_cpu_socket_id(p_link->pmd_id), &p_txq->conf); if (status < 0) rte_panic("%s (%" PRIu32 "): " @@ -989,7 +997,7 @@ app_init_tm(struct app_params *app) /* TM */ p_tm->sched_port_params.name = p_tm->name; p_tm->sched_port_params.socket = - rte_eth_dev_socket_id(p_link->pmd_id); + app_get_cpu_socket_id(p_link->pmd_id); p_tm->sched_port_params.rate = (uint64_t) link_eth_params.link_speed * 1000 * 1000 / 8; -- 2.5.0
[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue
Hi Abhijeet, It seems you are almost there! When booting the VM?s do you request hugepage memory for them (by setting hw:mem_page_size=large in flavor extra_spec)? If not then please do, if yes then please look into libvirt logfiles for the VM?s (in /var/log/libvirt/qemu/instance-xxx), I think there could be a clue. Regards Przemek From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] Sent: Monday, January 25, 2016 6:13 PM To: Czesnowicz, Przemyslaw Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue Hi Przemek, Thank you for your response, It really provided us breakthrough. After setting up DPDK on compute node for stable/kilo, We are trying to set up Openstack stable/liberty all-in-one setup, At present we are not able to get the IP allocation for the vhost type instances through DHCP. Also we tried assigning IP's manually to them but the inter-VM communication also not happening, #neutron agent-list root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list +--++---+---++---+ | id | agent_type | host | alive | admin_state_up | binary| +--++---+---++---+ | 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent | nfv-dpdk-devstack | :-) | True | neutron-openvswitch-agent | | 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent | nfv-dpdk-devstack | :-) | True | neutron-l3-agent | | 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent | nfv-dpdk-devstack | :-) | True | neutron-dhcp-agent| | b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | nfv-dpdk-devstack | :-) | True | neutron-linuxbridge-agent | | c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent | nfv-dpdk-devstack | :-) | True | neutron-metadata-agent| | f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | nfv-dpdk-devstack | xxx | True | neutron-openvswitch-agent | +--++---+---++---+ ovs-vsctl show output# Bridge br-dpdk Port br-dpdk Interface br-dpdk type: internal Port phy-br-dpdk Interface phy-br-dpdk type: patch options: {peer=int-br-dpdk} Bridge br-int fail_mode: secure Port "vhufa41e799-f2" tag: 5 Interface "vhufa41e799-f2" type: dpdkvhostuser Port int-br-dpdk Interface int-br-dpdk type: patch options: {peer=phy-br-dpdk} Port "tap4e19f8e1-59" tag: 5 Interface "tap4e19f8e1-59" type: internal Port "vhu05734c49-3b" tag: 5 Interface "vhu05734c49-3b" type: dpdkvhostuser Port "vhu10c06b4d-84" tag: 5 Interface "vhu10c06b4d-84" type: dpdkvhostuser Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "vhue169c581-ef" tag: 5 Interface "vhue169c581-ef" type: dpdkvhostuser Port br-int Interface br-int type: internal Bridge br-tun fail_mode: secure Port br-tun Interface br-tun type: internal error: "could not open network device br-tun (Invalid argument)" Port patch-int Interface patch-int type: patch options: {peer=patch-tun} ovs_version: "2.4.0" ovs-ofctl dump-flows br-int# root at nfv-dpdk-devstack:/etc/neutron# ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): cookie=0xaaa002bb2bcf827b, duration=2410.012s, table=0, n_packets=0, n_bytes=0, idle_age=2410, priority=10,icmp6,in_port=43,icmp_type=136 actions=resubmit(,24) cookie=0xaaa002bb2bcf827b, duration=2409.480s, table=0, n_packets=0, n_bytes=0, idle_age=2409, priority=10,icmp6,in_port=44,icmp_type=136 actions=resubmit(,24) cookie=0xaaa002bb2bcf827b, duration=2408.704s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=45,icmp_type=136 actions=resubmit(,24) cookie=0xaaa002bb2bcf827b, duration=2408.155s, table=0, n_packets=0, n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=42,
[dpdk-dev] [RFC] eal: add cgroup-aware resource self discovery
Hi Neil, On 1/26/2016 10:19 PM, Neil Horman wrote: > On Tue, Jan 26, 2016 at 10:22:18AM +0800, Tan, Jianfeng wrote: >> Hi Neil, >> >> On 1/25/2016 9:46 PM, Neil Horman wrote: >>> On Mon, Jan 25, 2016 at 02:49:53AM +0800, Jianfeng Tan wrote: >> ... -- 2.1.4 >>> This doesn't make a whole lot of sense, for several reasons: >>> >>> 1) Applications, as a general rule shouldn't be interrogating the cgroups >>> interface at all. >> The main reason to do this in DPDK is that DPDK obtains resource information >> from sysfs and proc, which are not well containerized so far. And DPDK >> pre-allocates resource instead of on-demand gradual allocating. >> > Not disagreeing with this, just suggesting that: > > 1) Interrogating cgroups really isn't the best way to collect that information > 2) Pre-allocating those resources isn't particularly wise without some > mechanism > to reallocate it, as resource constraints can change (consider your cpuset > getting rewritten) In the case of reallocate, For cpuset, DPDK panics in the initialization if set_affinity fails, but after that, cpuset rewritten will not bring any problem I believe. For memory, a running application uses 2G hugepages, then admin decreases hugetlb cgroup into 1G, the application will not get killed, unless it tries to access more hugepages (I'll double check this). So another way to address this problem is to add an option that DPDK tries best to allocate those resources, and if fails, it just posts a warning and uses those allocated resources, instead of panic. What do you think? > >>> 2) Cgroups aren't the only way in which a cpuset or memoryset can be >>> restricted >>> (the isolcpus command line argument, or a taskset on a parent process for >>> instance, but there are several others). >> Yes, I agree. To enable that, I'd like design the new API for resource self >> discovery in a flexible way. A parameter "type" is used to specify the >> solution to discovery way. In addition, I'm considering to add a callback >> function pointer so that users can write their own resource discovery >> functions. >> > Why? You don't need an API for this, or if you really want one, it can be > very > generic if you use POSIX apis to gather the information. What you have here > is > going to be very linux specific, and will need reimplementing for BSD or other > operating systems. To use the cpuset example, instead of reading and parsing > the mask files in the cgroup filesystem module to find your task and > corresponding mask, just call sched_setaffinity with an all f's mask, then > call > sched_getaffinity. The returned mask will be all the cpus your process is > allowed to execute on, taking into account every limiting filter the system > you > are running on offers. Yes, it makes sense on cpu's side. > > There are simmilar OS level POSIX apis for most resources out there. You > really > don't need to dig through cgroups just to learn what some of those reources > are. > >>> Instead of trying to figure out what cpuset is valid for your process by >>> interrogating the cgroups heirarchy, instead you should follow the >>> proscribed >>> method of calling sched_getaffinity after calling sched_setaffinity. That >>> will >>> give you the canonical cpuset that you are executing on, taking all cpuset >>> filters into account (including cgroups and any other restrictions). Its >>> far >>> simpler as well, as it doesn't require a ton of file/string processing. >> Yes, this way is much better for cpuset discovery. But is there such a >> syscall for hugepages? >> > In what capacity? Interrogating how many hugepages you have, or to what node > they are affined to? Capacity would require reading the requisite proc file, > as > theres no posix api for this resource. Node affinity can be implied by > setting > the numa policy of the dpdk and then writing to /proc/nr_hugepages, as the > kernel will attempt to distribute hugepages evenly among the tasks' numa > policy > configuration. For memory affinity, I believe the existing way of reading /proc/self/pagemap already handle the problem. What I was asking is how much memory (or hugepages in Linux's case) can be used. By the way, what is /proc/nr_hugepages? > > That said, I would advise that you strongly consider not exporting hugepages > as > a resource, as: > > a) Applications generally don't need to know that they are using hugepages, > and > so they dont need to know where said hugepages live, they just allocate memory > via your allocation api and you give them something appropriate But the allocation api provider, DPDK library, needs to know if it's using hugepages or not. > b) Hugepages are a resource that are very specific to Linux, and to X86 Linux > at > that. Some OS implement simmilar resources, but they may have very different > semantics. And other Arches may or may not implement various forms of > compound > paging at all. As the DPDK expands to sup
[dpdk-dev] [PATCH v2 4/4] virtio: check if any kernel driver is manipulating the virtio device
2016-01-07 16:17, Panu Matilainen: > On 01/03/2016 07:56 PM, Huawei Xie wrote: > > v2 changes: > > change LOG level from ERR to INFO > > > > virtio PMD could use IO port to configure the virtio device without > > using uio driver. > > > > There are two issues with previous implementation: > > 1) virtio PMD will take over each virtio device blindly even if some > > are not intended for DPDK. > > 2) driver conflict between virtio PMD and virtio-net kernel driver. > > > > This patch checks if there is any kernel driver manipulating the virtio > > device before virtio PMD uses IO port to configure the device. > > > > Fixes: da978dfdc43b ("virtio: use port IO to get PCI resource") > > > > Signed-off-by: Huawei Xie > > --- > > drivers/net/virtio/virtio_ethdev.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/net/virtio/virtio_ethdev.c > > b/drivers/net/virtio/virtio_ethdev.c > > index e815acd..7a50dac 100644 > > --- a/drivers/net/virtio/virtio_ethdev.c > > +++ b/drivers/net/virtio/virtio_ethdev.c > > @@ -1138,6 +1138,13 @@ static int virtio_resource_init_by_ioports(struct > > rte_pci_device *pci_dev) > > int found = 0; > > size_t linesz; > > > > + if (pci_dev->kdrv != RTE_KDRV_NONE) { > > + PMD_INIT_LOG(INFO, > > + "kernel driver is manipulating this device." \ > > + " Please unbind the kernel driver."); > > At the very least this message needs to be changed. > > Like said earlier, I think the message could just as well be dropped > entirely, but at least it should be something to the tune of "ignoring > kernel owned device" instead of asking the user to break their > configuration. Huawei, a v3 is required. Thanks
[dpdk-dev] [PATCH 0/9] pci cleanup and blacklist rework
On Fri, Jan 22, 2016 at 4:27 PM, David Marchand wrote: > The 4th patch introduces a change in linux eal. > Before, if a pci device was bound to no kernel driver, eal would set kdrv > to "unknown". With this change, kdrv is set to "none". > This might make it possible to avoid the old issue of virtio devices being > used by dpdk while still bound to kernel driver reported by Franck B.. > I'll let virtio guys look at this. > At the very least, it makes more sense to me. Ok, actually, I had forgotten that Huawei had already sent a similar change [1]. So I suppose this patch commitlog is wrong, but the patch itself is still worth for the cleanup. Thomas, I suppose you will integrate Huawei patches first. Then I will rebase and fix the commitlog. [1] http://dpdk.org/dev/patchwork/patch/9718/ -- David Marchand
[dpdk-dev] [PATCH] vfio/noiommu: Don't use iommu_present() to track fake groups
Hi Alex, > On 01/23/2016 04:23 AM, Alex Williamson wrote: > > Using iommu_present() to determine whether an IOMMU group is real or > > fake has some problems. First, apparently Power systems don't > > register an IOMMU on the device bus, so the groups and containers get > > marked as noiommu and then won't bind to their actual IOMMU driver. > > Second, I expect we'll run into the same issue as we try to support > > vGPUs through vfio, since they're likely to emulate this behavior of > > creating an IOMMU group on a virtual device and then providing a vfio > > IOMMU backend tailored to the sort of isolation they provide, which > > won't necessarily be fully compatible with the IOMMU API. > > > > The solution here is to use the existing iommudata interface to IOMMU > > groups, which allows us to easily identify the fake groups we've > > created for noiommu purposes. The iommudata we set is purely > > arbitrary since we're only comparing the address, so we use the > > address of the noiommu switch itself. > > > > Reported-by: Alexey Kardashevskiy > > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode") > > Signed-off-by: Alex Williamson > > > > Reviewed-by: Alexey Kardashevskiy > Tested-by: Alexey Kardashevskiy Tested bringing the NIC's up, encountered no issues. Curious if it also works for Santosh (CC'd) as he's one of the intended users of the No-IOMMU functionality, but otherwise seems to work. Thanks, Anatoly
[dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job
On 01/26/2016 06:15 PM, Marcin Kerlin wrote: > This patch adds new function rte_jobstats_abort. It marks *job* as finished > and time of this work will be add to management time instead of execution > time. > This function should be used instead of rte_jobstats_finish if condition > occure, > condition is defined by the application for example when receiving n>0 > packets. > > Signed-off-by: Marcin Kerlin > --- > lib/librte_jobstats/rte_jobstats.c | 22 ++ > lib/librte_jobstats/rte_jobstats.h | 17 + > lib/librte_jobstats/rte_jobstats_version.map | 7 +++ > 3 files changed, 46 insertions(+) > [...] > diff --git a/lib/librte_jobstats/rte_jobstats.h > b/lib/librte_jobstats/rte_jobstats.h > index de6a89a..9995319 100644 > --- a/lib/librte_jobstats/rte_jobstats.h > +++ b/lib/librte_jobstats/rte_jobstats.h > @@ -90,6 +90,9 @@ struct rte_jobstats { > uint64_t exec_cnt; > /**< Execute count. */ > > + uint64_t last_job_time; > + /**< Last job time */ > + > char name[RTE_JOBSTATS_NAMESIZE]; > /**< Name of this job */ > AFAICS this is an ABI break and as such, needs to be preannounced, see http://dpdk.org/doc/guides/contributing/versioning.html For 2.3 it'd need to be a CONFIG_RTE_NEXT_ABI feature. - Panu -
[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment
2016-01-20 21:15, Matthew Hall: > On 1/20/16 7:27 AM, Thomas Monjalon wrote: > > Hi Matthew, > > > > RTE_SDK_BIN is an internal variable and should not be overriden. > > > > Have you installed DPDK somewhere? Example: > > make install O=mybuild DESTDIR=mylocalinstall > > > > Then you should build your app like this: > > make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk) > > Hello Thomas, > > Is the way the make install target really works documented somewhere? It is poorly described here: http://dpdk.org/doc/guides/prog_guide/dev_kit_root_make_help.html#install-targets > This target did not exist when I first used DPDK in 2011, and since then > I saw various documentation on building DPDK in various places, but not > that much explanation what make install actually does. I recall various > list threads about changing its behavior as well. Historically, "make install" was a convenient default build (with T= option). The DESTDIR option was added to make a real install after building. The standard form (without T=) is now implemented to do a real install. > For example, if I look at this apparently most official document: > > http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html > > It has build examples such as: > > make install T=x86_64-native-linuxapp-gcc This command finishes with this message: Installation cannot run with T defined and DESTDIR undefined Yes you are right, some docs are neither complete nor up-to-date. Volunteers are welcome. > But it does not discuss "O=" or "DESTDIR=" or any other additional > options. From some experiments on my machine, it looks like maybe I > could do this: > > make install "T=${RTE_TARGET}" "O=build" "DESTDIR=build" > > Is that a valid possibility, to keep it all in one easy directory? Yes you can install where you want. Note that this command (with T= and O=) will build in the directory $O/$T i.e. build/${RTE_TARGET} and install in build/ Please confirm that this patch is not needed. Thanks
[dpdk-dev] [PATCH] vfio/noiommu: Don't use iommu_present() to track fake groups
On Wed, Jan 27, 2016 at 6:51 PM, Burakov, Anatoly wrote: > Hi Alex, > >> On 01/23/2016 04:23 AM, Alex Williamson wrote: >> > Using iommu_present() to determine whether an IOMMU group is real or >> > fake has some problems. First, apparently Power systems don't >> > register an IOMMU on the device bus, so the groups and containers get >> > marked as noiommu and then won't bind to their actual IOMMU driver. >> > Second, I expect we'll run into the same issue as we try to support >> > vGPUs through vfio, since they're likely to emulate this behavior of >> > creating an IOMMU group on a virtual device and then providing a vfio >> > IOMMU backend tailored to the sort of isolation they provide, which >> > won't necessarily be fully compatible with the IOMMU API. >> > >> > The solution here is to use the existing iommudata interface to IOMMU >> > groups, which allows us to easily identify the fake groups we've >> > created for noiommu purposes. The iommudata we set is purely >> > arbitrary since we're only comparing the address, so we use the >> > address of the noiommu switch itself. >> > >> > Reported-by: Alexey Kardashevskiy >> > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode") >> > Signed-off-by: Alex Williamson >> >> >> >> Reviewed-by: Alexey Kardashevskiy >> Tested-by: Alexey Kardashevskiy > > Tested bringing the NIC's up, encountered no issues. Curious if it also works > for Santosh (CC'd) as he's one of the intended users of the No-IOMMU > functionality, but otherwise seems to work. > Yes, Its works for virtio dpdk case too, Tested-by: Thanks. > Thanks, > Anatoly
[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
On 01/26/2016 07:03 PM, Huawei Xie wrote: > v6 changes: > reflect the changes in release notes and library version map file > revise our duff's code style a bit to make it more readable > > v5 changes: > add comment about duff's device and our variant implementation > > v3 changes: > move while after case 0 > add context about duff's device and why we use while loop in the commit > message > > v2 changes: > unroll the loop a bit to help the performance > > rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs. > > There is related thread about this bulk API. > http://dpdk.org/dev/patchwork/patch/4718/ > Thanks to Konstantin's loop unrolling. > > Attached the wiki page about duff's device. It explains the performance > optimization through loop unwinding, and also the most dramatic use of > case label fall-through. > https://en.wikipedia.org/wiki/Duff%27s_device > > In our implementation, we use while() loop rather than do{} while() loop > because we could not assume count is strictly positive. Using while() > loop saves one line of check if count is zero. > > Signed-off-by: Gerald Rogers > Signed-off-by: Huawei Xie > Acked-by: Konstantin Ananyev > --- > doc/guides/rel_notes/release_2_3.rst | 3 ++ > lib/librte_mbuf/rte_mbuf.h | 55 > > lib/librte_mbuf/rte_mbuf_version.map | 7 + > 3 files changed, 65 insertions(+) > > diff --git a/doc/guides/rel_notes/release_2_3.rst > b/doc/guides/rel_notes/release_2_3.rst > index 99de186..a52cba3 100644 > --- a/doc/guides/rel_notes/release_2_3.rst > +++ b/doc/guides/rel_notes/release_2_3.rst > @@ -4,6 +4,9 @@ DPDK Release 2.3 > New Features > > > +* **Enable bulk allocation of mbufs. ** > + A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the > user > + to allocate a bulk of mbufs. > > Resolved Issues > --- > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index f234ac9..b2ed479 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -1336,6 +1336,61 @@ static inline struct rte_mbuf > *rte_pktmbuf_alloc(struct rte_mempool *mp) > } > > /** > + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to > default > + * values. > + * > + * @param pool > + *The mempool from which mbufs are allocated. > + * @param mbufs > + *Array of pointers to mbufs > + * @param count > + *Array size > + * @return > + * - 0: Success > + */ > +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool, > + struct rte_mbuf **mbufs, unsigned count) > +{ > + unsigned idx = 0; > + int rc; > + > + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); > + if (unlikely(rc)) > + return rc; > + > + /* To understand duff's device on loop unwinding optimization, see > + * https://en.wikipedia.org/wiki/Duff's_device. > + * Here while() loop is used rather than do() while{} to avoid extra > + * check if count is zero. > + */ > + switch (count % 4) { > + case 0: > + while (idx != count) { > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 3: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 2: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 1: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + } > + } > + return 0; > +} > + > +/** >* Attach packet mbuf to another packet mbuf. >* >* After attachment we refer the mbuf we attached as 'indirect', > diff --git a/lib/librte_mbuf/rte_mbuf_version.map > b/lib/librte_mbuf/rte_mbuf_version.map > index e10f6bd..257c65a 100644 > --- a/lib/librte_mbuf/rte_mbuf_version.map > +++ b/lib/librte_mbuf/rte_mbuf_version.map > @@ -18,3 +18,10 @@ DPDK_2.1 { > rte_pktmbuf_pool_create; > > } DPDK_2.0; > + > +DPDK_2.3 { > + global: > + > + rte_pktmbuf_alloc_bulk; > + > +} DPDK_2.1; > Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of the library ABI and should not be listed in the version map. I assume its inline for performance reasons, but then you lose the benefits of dynamic linking such as ability to fix bugs and/or improve itby just updating the library. Since the point of
[dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode
This commit is adding a generic mechanism to support multiple IOMMU types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO mode that doesn't use IOMMU at all), but it's easily extended by adding necessary definitions into eal_pci_init.h and a DMA mapping function to eal_pci_vfio_dma.c. Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the module check to check for vfio-pci instead. It's not ideal and triggers VFIO checks more often (and thus produces more error output, which was the reason behind the module check in the first place), so we compensate for that by providing more verbose logging, indicating whether VFIO initialization has succeeded or failed. Signed-off-by: Anatoly Burakov Tested-by: Santosh Shukla --- v3 changes: Merging DMA mapping functions back into eal_pci_vfio.c Fixing and adding comments v2 changes: Compile fix (hat-tip to Santosh Shukla) Tested-by is provisional, since only superficial testing was done lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 + lib/librte_eal/linuxapp/eal/eal_vfio.h | 5 + 2 files changed, 157 insertions(+), 53 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index 74f91ba..fdf334b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq) #define VFIO_DIR "/dev/vfio" #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" #define VFIO_GROUP_FMT "/dev/vfio/%u" +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u" #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) /* per-process VFIO config */ static struct vfio_config vfio_cfg; +/* DMA mapping function prototype. + * Takes VFIO container fd as a parameter. + * Returns 0 on success, -1 on error. + * */ +typedef int (*vfio_dma_func_t)(int); + +struct vfio_iommu_type { + int type_id; + const char *name; + vfio_dma_func_t dma_map_func; +}; + +int vfio_iommu_type1_dma_map(int); +int vfio_iommu_noiommu_dma_map(int); + +/* IOMMU types we support */ +static const struct vfio_iommu_type iommu_types[] = { + /* x86 IOMMU, otherwise known as type 1 */ + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, + /* IOMMU-less mode */ + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, +}; + +int +vfio_iommu_type1_dma_map(int vfio_container_fd) +{ + const struct rte_memseg *ms = rte_eal_get_physmem_layout(); + int i, ret; + + /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ + for (i = 0; i < RTE_MAX_MEMSEG; i++) { + struct vfio_iommu_type1_dma_map dma_map; + + if (ms[i].addr == NULL) + break; + + memset(&dma_map, 0, sizeof(dma_map)); + dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); + dma_map.vaddr = ms[i].addr_64; + dma_map.size = ms[i].len; + dma_map.iova = ms[i].phys_addr; + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; + + ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); + + if (ret) { + RTE_LOG(ERR, EAL, " cannot set up DMA remapping, " + "error %i (%s)\n", errno, strerror(errno)); + return -1; + } + } + + return 0; +} + +int +vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd) +{ + /* No-IOMMU mode does not need DMA mapping */ + return 0; +} + int pci_vfio_read_config(const struct rte_intr_handle *intr_handle, void *buf, size_t len, off_t offs) @@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd) return 0; } -/* set up DMA mappings */ -static int -pci_vfio_setup_dma_maps(int vfio_container_fd) -{ - const struct rte_memseg *ms = rte_eal_get_physmem_layout(); - int i, ret; - - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, - VFIO_TYPE1_IOMMU); - if (ret) { - RTE_LOG(ERR, EAL, " cannot set IOMMU type, " - "error %i (%s)\n", errno, strerror(errno)); - return -1; +/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */ +static const struct vfio_iommu_type * +pci_vfio_set_iommu_type(int vfio_container_fd) { + unsigned idx; + for (idx = 0; idx < RTE_DIM(iommu_types); idx++) { + const struct vfio_iommu_type *t = &iommu_types[idx]; + + int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, + t->type_id); + if (!ret) { + RTE_LOG(NOTICE, EAL, " using IOMMU type %d (%s)\n", + t->type_id, t->name); +
[dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode
Apologies, lost the signoff from Santosh Shukla and also the commit message still mentions the file that is now non-existent, so I'll submit a v4. Thanks, Anatoly > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Anatoly Burakov > Sent: Wednesday, January 27, 2016 2:05 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode > > This commit is adding a generic mechanism to support multiple IOMMU > types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO > mode that doesn't use IOMMU at all), but it's easily extended by adding > necessary definitions into eal_pci_init.h and a DMA mapping function to > eal_pci_vfio_dma.c. > > Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the > module check to check for vfio-pci instead. It's not ideal and triggers VFIO > checks more often (and thus produces more error output, which was the > reason behind the module check in the first place), so we compensate for > that by providing more verbose logging, indicating whether VFIO initialization > has succeeded or failed. > > Signed-off-by: Anatoly Burakov > Tested-by: Santosh Shukla > --- > v3 changes: > Merging DMA mapping functions back into eal_pci_vfio.c > Fixing and adding comments > > v2 changes: > Compile fix (hat-tip to Santosh Shukla) > Tested-by is provisional, since only superficial testing was done > > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +-- > -- > lib/librte_eal/linuxapp/eal/eal_vfio.h | 5 + > 2 files changed, 157 insertions(+), 53 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > index 74f91ba..fdf334b 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > @@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq) > #define VFIO_DIR "/dev/vfio" > #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" > #define VFIO_GROUP_FMT "/dev/vfio/%u" > +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u" > #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) > > /* per-process VFIO config */ > static struct vfio_config vfio_cfg; > > +/* DMA mapping function prototype. > + * Takes VFIO container fd as a parameter. > + * Returns 0 on success, -1 on error. > + * */ > +typedef int (*vfio_dma_func_t)(int); > + > +struct vfio_iommu_type { > + int type_id; > + const char *name; > + vfio_dma_func_t dma_map_func; > +}; > + > +int vfio_iommu_type1_dma_map(int); > +int vfio_iommu_noiommu_dma_map(int); > + > +/* IOMMU types we support */ > +static const struct vfio_iommu_type iommu_types[] = { > + /* x86 IOMMU, otherwise known as type 1 */ > + { VFIO_TYPE1_IOMMU, "Type 1", > &vfio_iommu_type1_dma_map}, > + /* IOMMU-less mode */ > + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", > &vfio_iommu_noiommu_dma_map}, }; > + > +int > +vfio_iommu_type1_dma_map(int vfio_container_fd) { > + const struct rte_memseg *ms = rte_eal_get_physmem_layout(); > + int i, ret; > + > + /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ > + for (i = 0; i < RTE_MAX_MEMSEG; i++) { > + struct vfio_iommu_type1_dma_map dma_map; > + > + if (ms[i].addr == NULL) > + break; > + > + memset(&dma_map, 0, sizeof(dma_map)); > + dma_map.argsz = sizeof(struct > vfio_iommu_type1_dma_map); > + dma_map.vaddr = ms[i].addr_64; > + dma_map.size = ms[i].len; > + dma_map.iova = ms[i].phys_addr; > + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | > VFIO_DMA_MAP_FLAG_WRITE; > + > + ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, > &dma_map); > + > + if (ret) { > + RTE_LOG(ERR, EAL, " cannot set up DMA remapping, > " > + "error %i (%s)\n", errno, > strerror(errno)); > + return -1; > + } > + } > + > + return 0; > +} > + > +int > +vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd) { > + /* No-IOMMU mode does not need DMA mapping */ > + return 0; > +} > + > int > pci_vfio_read_config(const struct rte_intr_handle *intr_handle, > void *buf, size_t len, off_t offs) @@ -208,42 +271,58 @@ > pci_vfio_set_bus_master(int dev_fd) > return 0; > } > > -/* set up DMA mappings */ > -static int > -pci_vfio_setup_dma_maps(int vfio_container_fd) -{ > - const struct rte_memseg *ms = rte_eal_get_physmem_layout(); > - int i, ret; > - > - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, > - VFIO_TYPE1_IOMMU); > - if (ret) { > - RTE_LOG(ERR, EAL, " cannot set IOMMU type, " > - "error %i (%s)\n", errno, strerror(errno)); > - return -1; > +/* pick IOMMU type. returns a pointer t
[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode
This commit is adding a generic mechanism to support multiple IOMMU types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO mode that doesn't use IOMMU at all), but it's easily extended by adding necessary definitions into eal_pci_init.h and a DMA mapping function to eal_pci_vfio.c. Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the module check to check for vfio-pci instead. It's not ideal and triggers VFIO checks more often (and thus produces more error output, which was the reason behind the module check in the first place), so we compensate for that by providing more verbose logging, indicating whether VFIO initialization has succeeded or failed. Signed-off-by: Anatoly Burakov Signed-off-by: Santosh Shukla Tested-by: Santosh Shukla --- v4 changes: Fixed the commit message and added a missing sign-off v3 changes: Merging DMA mapping functions back into eal_pci_vfio.c Fixing and adding comments v2 changes: Compile fix (hat-tip to Santosh Shukla) Tested-by is provisional, since only superficial testing was done lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 + lib/librte_eal/linuxapp/eal/eal_vfio.h | 5 + 2 files changed, 157 insertions(+), 53 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index 74f91ba..fdf334b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq) #define VFIO_DIR "/dev/vfio" #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" #define VFIO_GROUP_FMT "/dev/vfio/%u" +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u" #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) /* per-process VFIO config */ static struct vfio_config vfio_cfg; +/* DMA mapping function prototype. + * Takes VFIO container fd as a parameter. + * Returns 0 on success, -1 on error. + * */ +typedef int (*vfio_dma_func_t)(int); + +struct vfio_iommu_type { + int type_id; + const char *name; + vfio_dma_func_t dma_map_func; +}; + +int vfio_iommu_type1_dma_map(int); +int vfio_iommu_noiommu_dma_map(int); + +/* IOMMU types we support */ +static const struct vfio_iommu_type iommu_types[] = { + /* x86 IOMMU, otherwise known as type 1 */ + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, + /* IOMMU-less mode */ + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, +}; + +int +vfio_iommu_type1_dma_map(int vfio_container_fd) +{ + const struct rte_memseg *ms = rte_eal_get_physmem_layout(); + int i, ret; + + /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ + for (i = 0; i < RTE_MAX_MEMSEG; i++) { + struct vfio_iommu_type1_dma_map dma_map; + + if (ms[i].addr == NULL) + break; + + memset(&dma_map, 0, sizeof(dma_map)); + dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); + dma_map.vaddr = ms[i].addr_64; + dma_map.size = ms[i].len; + dma_map.iova = ms[i].phys_addr; + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; + + ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); + + if (ret) { + RTE_LOG(ERR, EAL, " cannot set up DMA remapping, " + "error %i (%s)\n", errno, strerror(errno)); + return -1; + } + } + + return 0; +} + +int +vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd) +{ + /* No-IOMMU mode does not need DMA mapping */ + return 0; +} + int pci_vfio_read_config(const struct rte_intr_handle *intr_handle, void *buf, size_t len, off_t offs) @@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd) return 0; } -/* set up DMA mappings */ -static int -pci_vfio_setup_dma_maps(int vfio_container_fd) -{ - const struct rte_memseg *ms = rte_eal_get_physmem_layout(); - int i, ret; - - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, - VFIO_TYPE1_IOMMU); - if (ret) { - RTE_LOG(ERR, EAL, " cannot set IOMMU type, " - "error %i (%s)\n", errno, strerror(errno)); - return -1; +/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */ +static const struct vfio_iommu_type * +pci_vfio_set_iommu_type(int vfio_container_fd) { + unsigned idx; + for (idx = 0; idx < RTE_DIM(iommu_types); idx++) { + const struct vfio_iommu_type *t = &iommu_types[idx]; + + int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, + t->type_id); + if (!ret) { + RTE_LOG(NOTICE, EAL, " using IOMMU typ
[dpdk-dev] [PATCH] no need to test for NULL when freeing
2016-01-21 12:23, David Marchand: > free() already handles NULL pointer. > > Signed-off-by: David Marchand Applied, thanks
[dpdk-dev] [PATCH v2 0/2] minor cleanup in ethdev hotplug
2016-01-22 15:06, David Marchand: > It was first a preparation step for future patchsets, but I am not sure what > will become of them, so sending this anyway since it does not hurt to clean > this now. > > Changes since v1: > - rebased on HEAD (previous patchset was based on another patch I sent > separately) > - restored EINVAL error code for rte_eth_dev_(at|de)tach (thanks Jan) Applied, thanks
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
2016-01-17 22:05, Zhihong Wang: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. On a related note, your expertise would be very valuable to review these patches please: (memcpy) http://dpdk.org/dev/patchwork/patch/4396/ (memcmp) http://dpdk.org/dev/patchwork/patch/4788/ Thanks
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> Zhihong Wang (5): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > lib/librte_eal: Tune memcpy for prior platforms > > app/test/test_memcpy_perf.c| 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 269 > - > mk/rte.cpuflags.mk | 4 + > 4 files changed, 268 insertions(+), 13 deletions(-) The maintainers of arch/x86 are Bruce and Konstantin. I guess there is no comment and we can apply this cool series?
[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
On Wed, Jan 27, 2016 at 4:11 PM, Santosh Shukla wrote: > On Tue, Jan 26, 2016 at 9:51 PM, Santosh Shukla wrote: >> On Tue, Jan 26, 2016 at 7:58 PM, Thomas Monjalon >> wrote: >>> 2016-01-26 19:35, Santosh Shukla: On Tue, Jan 26, 2016 at 6:30 PM, Thomas Monjalon wrote: > 2016-01-26 15:56, Santosh Shukla: >> In my observation, currently virtio work for vfio-noiommu, that's why >> said drv->kdrv need to know vfio mode. > > It is your observation. It may change in near future. so that mean till then, virtio support for non-x86 arch has to wait? >>> >>> No, absolutely not. virtio for non-x86 is welcome. >>> We have working model with vfio-noiommu, don't you think it make sense to let vfio_noiommu implementation exist and later in-case virtio+iommu gets mainline then switch to vfio __mode__ agnostic approach. And for that All it takes to replace __noiommu suffix with default. >>> >>> I'm just saying you should not touch the enum rte_kernel_driver. >>> RTE_KDRV_VFIO is a driver. >>> RTE_KDRV_VFIO_NOIOMMU is a mode. >>> As the VFIO API is the same in both modes, there is no reason to >>> distinguish them at this level. >>> Your patch adds the NOIOMMU case everywhere: >>> case RTE_KDRV_VFIO: >>> + case RTE_KDRV_VFIO_NOIOMMU: >>> >>> I'll stop commenting here to let others give their opinion. >>> >>> [...] >> with vfio+iommu; binding virtio pci device to vfio-pci driver fail; >> giving below error: >> [ 53.053464] VFIO - User Level meta-driver version: 0.3 >> [ 73.077805] vfio-pci: probe of :00:03.0 failed with error -22 >> [ 73.077852] vfio-pci: probe of :00:03.0 failed with error -22 >> >> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get() >> fails: iommu doesn't have group for virtio pci device. > > Yes it fails when binding. > So the later check in the virtio PMD is useless. Which check? >>> >>> The check for VFIO noiommu only: >>> - if (dev->kdrv == RTE_KDRV_VFIO) >>> + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) >>> >>> [...] > Furthermore restricting virtio to no-iommu mode doesn't bring > any improvement. We're not __restricting__, as soon as virtio+iommu gets working state, we'll simply replace __noiommu with default. Then its upto user to try out virtio with vfio default or vfio_noiommu. >>> >>> Yes it's up to user. >>> So your code should be >>> if (dev->kdrv == RTE_KDRV_VFIO) >>> >> >> Right, >> > That's why I suggest to keep the initial semantic of kdrv and > not pollute it with VFIO modes. I am okay to live with default and forget suffix __noiommu but there are implementation problem which was discussed in other thread - Virtio pmd driver should avoid interface parsing i.e. virtio_resource_init_uio/vfio() etc.. For vfio case - We could easily get rid of by moving /sys parsing to pci_eal layer, Right? If so then virtio currently works with vfio-noiommu, it make sense to me that pci_eal layer does parsing for pmd driver before that pmd driver get initialized. >>> >>> Please reword. What is the problem? >>> - Another case could be: iommu-less-pmd-driver. eal layer to do parsing before updating drv->kdrv. >>> >>> [...] >> >> > If a check is needed, I would prefer using your function >> >> > pci_vfio_is_noiommu() and remove driver modes from struct >> >> > rte_kernel_driver. >> >> >> >> I don't think calling pci_vfio_no_iommu() inside >> >> virtio_reg_rd/wr_1/2/3() would be a good idea. >> > >> > Why? The value may be cached in the priv properties. >> > >> pci_vfio_is_noiommu() parses /sys for >> - enable_noiommu param >> - attached driver name is vfio-noiommu or not. >> >> It does file operation for that, I meant to say that calling this api >> within register_rd/wr function is not correct. It would be better if >> those low level register_rd/wr api only checks driver_types. > > Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached > to keep efficiency. I am not convinced though, Still find pmd driver checking driver_types using drv->kdrv is better approach than introducing a new global variable which may look something like; >>> >>> Not a global variable. A function in EAL layer. A variable in PMD priv. >>> >> >> If we agreed to use condition (drv->kdrv == RTE_KDRV_VFIO); >> then resource parsing for vfio {including vfio and vfio_noiommu both >> case} is enforced in virtio pmd driver layer and that is contradicting >> to what we agreed earlier in this[1] thread. Also we don't need a >> function in EAL layer or a variable in PMD priv. Perhaps a private >> function in virtio pmd which does parsing for vfio interface. >> >> Thoughts? >> >> [1] http://dpdk.org/dev/patchwork/patch/9862/ >> >
[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
2016-01-27 21:02, Santosh Shukla: > 1. virtio currently works for vfio+noiommu and likely will work for > vfio+iommu in near future. > 2. So remove __noiommu suffix and always use default. > 3. Introduce vfio resource parsing global function, That function > suppose to do parsing for default vfio case and for vfio-noiommu case. > This function will be used by pmd drivers for resource parsing purpose > example virtio. > > Yuan won't be happy with 3) I guess, because he wanted to get rid of > interface parsing from pmd driver. > > Thomas, if 1/2/3/ addresses your concern then I'll spin the series, I agree with 1/ and 2/. Please, could you explain why 3/ is needed?
[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode
2016-01-27 14:32, Anatoly Burakov: > +/* DMA mapping function prototype. > + * Takes VFIO container fd as a parameter. > + * Returns 0 on success, -1 on error. > + * */ > +typedef int (*vfio_dma_func_t)(int); > + > +struct vfio_iommu_type { > + int type_id; > + const char *name; > + vfio_dma_func_t dma_map_func; > +}; > + > +int vfio_iommu_type1_dma_map(int); > +int vfio_iommu_noiommu_dma_map(int); Is it possible (is it better) to declare these functions with vfio_dma_func_t? vfio_iommu_noiommu_dma_map is a weird name. Why not vfio_noiommu_dma_map or vfio_iommu_none_dma_map?
[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
On Wed, Jan 27, 2016 at 9:09 PM, Thomas Monjalon wrote: > 2016-01-27 21:02, Santosh Shukla: >> 1. virtio currently works for vfio+noiommu and likely will work for >> vfio+iommu in near future. >> 2. So remove __noiommu suffix and always use default. >> 3. Introduce vfio resource parsing global function, That function >> suppose to do parsing for default vfio case and for vfio-noiommu case. >> This function will be used by pmd drivers for resource parsing purpose >> example virtio. >> >> Yuan won't be happy with 3) I guess, because he wanted to get rid of >> interface parsing from pmd driver. >> >> Thomas, if 1/2/3/ addresses your concern then I'll spin the series, > > I agree with 1/ and 2/. > Please, could you explain why 3/ is needed? Because someone should do resource parsing / validation before driver does resource mapping/initialization. That someone could be either EAL layer or driver itself. In my case; - driver is virtio - resource is vfio interface
[dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen > Sent: Wednesday, January 27, 2016 2:38 PM > To: Kerlin, MarcinX ; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job > > On 01/26/2016 06:15 PM, Marcin Kerlin wrote: > > This patch adds new function rte_jobstats_abort. It marks *job* as finished > > and time of this work will be add to management time instead of execution > time. > > This function should be used instead of rte_jobstats_finish if condition > occure, > > condition is defined by the application for example when receiving n>0 > packets. > > > > Signed-off-by: Marcin Kerlin > > --- > > lib/librte_jobstats/rte_jobstats.c | 22 ++ > > lib/librte_jobstats/rte_jobstats.h | 17 + > > lib/librte_jobstats/rte_jobstats_version.map | 7 +++ > > 3 files changed, 46 insertions(+) > > > [...] > > diff --git a/lib/librte_jobstats/rte_jobstats.h > b/lib/librte_jobstats/rte_jobstats.h > > index de6a89a..9995319 100644 > > --- a/lib/librte_jobstats/rte_jobstats.h > > +++ b/lib/librte_jobstats/rte_jobstats.h > > @@ -90,6 +90,9 @@ struct rte_jobstats { > > uint64_t exec_cnt; > > /**< Execute count. */ > > > > + uint64_t last_job_time; > > + /**< Last job time */ > > + > > char name[RTE_JOBSTATS_NAMESIZE]; > > /**< Name of this job */ > > > > AFAICS this is an ABI break and as such, needs to be preannounced, see > http://dpdk.org/doc/guides/contributing/versioning.html > For 2.3 it'd need to be a CONFIG_RTE_NEXT_ABI feature. > > - Panu - Hi Panu, Thanks for Your notice. This last_job_time field is actually not necessary here and will be removed from this structure. Best regards Michal
[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment
On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote: [snip] > + > +static int > +qtest_raw_recv(int fd, char *buf, size_t count) > +{ > + size_t len = count; > + size_t total_len = 0; > + int ret = 0; > + > + while (len > 0) { > + ret = read(fd, buf, len); > + if (ret == (int)len) > + break; > + if (*(buf + ret - 1) == '\n') > + break; The above two lines should be put after the below if block. > + if (ret == -1) { > + if (errno == EINTR) > + continue; > + return ret; > + } > + total_len += ret; > + buf += ret; > + len -= ret; > + } > + return total_len + ret; > +} > + [snip] > + > +static void > +qtest_handle_one_message(struct qtest_session *s, char *buf) > +{ > + int ret; > + > + if (strncmp(buf, interrupt_message, strlen(interrupt_message)) == 0) { > + if (rte_atomic16_read(&s->enable_intr) == 0) > + return; > + > + /* relay interrupt to pipe */ > + ret = write(s->irqfds.writefd, "1", 1); > + if (ret < 0) > + rte_panic("cannot relay interrupt\n"); > + } else { > + /* relay normal message to pipe */ > + ret = qtest_raw_send(s->msgfds.writefd, buf, strlen(buf)); > + if (ret < 0) > + rte_panic("cannot relay normal message\n"); > + } > +} > + > +static char * > +qtest_get_next_message(char *p) > +{ > + p = strchr(p, '\n'); > + if ((p == NULL) || (*(p + 1) == '\0')) > + return NULL; > + return p + 1; > +} > + > +static void > +qtest_close_one_socket(int *fd) > +{ > + if (*fd > 0) { > + close(*fd); > + *fd = -1; > + } > +} > + > +static void > +qtest_close_sockets(struct qtest_session *s) > +{ > + qtest_close_one_socket(&s->qtest_socket); > + qtest_close_one_socket(&s->msgfds.readfd); > + qtest_close_one_socket(&s->msgfds.writefd); > + qtest_close_one_socket(&s->irqfds.readfd); > + qtest_close_one_socket(&s->irqfds.writefd); > + qtest_close_one_socket(&s->ivshmem_socket); > +} > + > +/* > + * This thread relays QTest response using pipe. > + * The function is needed because we need to separate IRQ message from > others. > + */ > +static void * > +qtest_event_handler(void *data) { > + struct qtest_session *s = (struct qtest_session *)data; > + char buf[1024]; > + char *p; > + int ret; > + > + for (;;) { > + memset(buf, 0, sizeof(buf)); > + ret = qtest_raw_recv(s->qtest_socket, buf, sizeof(buf)); > + if (ret < 0) { > + qtest_close_sockets(s); > + return NULL; > + } > + > + /* may receive multiple messages at the same time */ >From the qtest_raw_recv implementation, if at some point one message is received by two qtest_raw_recv calls, then is that message discarded? We could save the last incomplete message in buffer, and combine the message received next time together. > + p = buf; > + do { > + qtest_handle_one_message(s, p); > + } while ((p = qtest_get_next_message(p)) != NULL); > + } > + return NULL; > +} > +
[dpdk-dev] bnx2x driver and 57800 versus 57810
On Wed, 2016-01-27 at 07:32 +, Harish Patil wrote: > > > >I have to practically identical systems, same hypervisor on each > (Centos > >7.x).??In one, I have a 57800 card which works fine with DPDK with > >SRIOV.??In the other, I have a 57810 card which doesn't work with > SRIOV. > > > >For the 57810 I have tracked this down to the status block in the VF > >failing to be updated.??The linux driver works fine but it appears to > >use a slightly different scheme -- writing some sort of fastpath > status > >block generation per interrupt. > > > >Does anyone have any suggestions or a programming guide for this > device? > > > > >? > What is not working with 57810? Is it link related or traffic? Please > provide the details. > Attached is the SW programming guide for 577xx/578xx. I?m not sure if > it has details pertaining to the specific issue that you have. The DPDK PMD driver seems to be able to transmit packets on the 57810. But since the status block isn't getting updated, you can't reclaim the sent buffers.??I modified the driver to use the marker based receive detection (similar to the method used in the Linux driver) and I can see packets getting received (certainly broadcast is received -- possibly not unicast packets though which seems to indicate that part of the RX path is possibly still broken). I have tried a couple things.??The status page in the DPDK PMD driver isn't getting page aligned (as well as a bunch of other structures that should probably be page aligned). The Linux driver happens to do this as a side effect of the DMA allocator.??Fixing this didn't seem to improve matters though.??The status block doesn't seem to get updated. I verified that the correct DMA address is getting passed to the PF. And since it works on the 57800, I thought perhaps something changed. Also, the DPDK driver probably gets the RX/TX queue indices wrong during initial setup.??The final values coming out of the allocation loop are probably bigger than they should be.??Should they point to the end of the queue or just past the end???Also, the tail of the queue needs to be corrected for the double entry at the end of the pages.??Again, fixing this didn't seem to help either. The VF-PF interaction seems to be ok as well.??Other than not supporting SGE, the DPDK PMD driver seems to send reasonably correct messages to the PF. I don't see the DPDK PMD driver doing anything to 'reset' the PCI apsect of the VF.??If there is any left over configuration for interrupts, like leaving the IGU enabled that maybe not be cleared, I am not sure what the interaction might be.??I do know the Linux driver does seem to use MSI-X interrupts. > Thanks, > Harish Thanks for looking at this and thanks for the programming guide.??It will take me a bit to digest it.
[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode
Hi Thomas, > > +/* DMA mapping function prototype. > > + * Takes VFIO container fd as a parameter. > > + * Returns 0 on success, -1 on error. > > + * */ > > +typedef int (*vfio_dma_func_t)(int); > > + > > +struct vfio_iommu_type { > > + int type_id; > > + const char *name; > > + vfio_dma_func_t dma_map_func; > > +}; > > + > > +int vfio_iommu_type1_dma_map(int); > > +int vfio_iommu_noiommu_dma_map(int); > > Is it possible (is it better) to declare these functions with vfio_dma_func_t? Yeah, sure. Or maybe the other way around - maybe we could do away with the typedef. I'll go for the former though. > vfio_iommu_noiommu_dma_map is a weird name. > Why not vfio_noiommu_dma_map or vfio_iommu_none_dma_map? Well, the NOIOMMU type is named VFIO_IOMMU_NOIOMMU in the VFIO headers. So it's consistent with the IOMMU type name. Although vfio_noiommu_dma_map seems reasonable. Thanks, Anatoly
[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue
Hi Przemek, Thanks for the quick response. Now able to get the DHCP ip's for 2 vhostuser instances and able to ping each other. Isssue was a bug in cirros 0.3.0 images which we were using in openstack after using 0.3.1 image as given in the URL( https://www.redhat.com/archives/rhos-list/2013-August/msg00032.html), able to get the IP's in vhostuser VM instances. As per our understanding, Packet flow across DPDK datapath will be like vhostuser ports are connected to the br-int bridge & same is being patched to the br-dpdk bridge where in our physical network (NIC) is connected with dpdk0 port. So for testing the flow we have to connect that physical network(NIC) with external packet generator (e.g - ixia, iperf) & run the testpmd application in the vhostuser VM, right? Does it required to add any flows/efforts in bridge configurations(either br-int or br-dpdk)? Thanks & Regards Abhijeet Karve From: "Czesnowicz, Przemyslaw" To: Abhijeet Karve Cc: "dev at dpdk.org" , "discuss at openvswitch.org" , "Gray, Mark D" Date: 01/27/2016 05:11 PM Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue Hi Abhijeet, It seems you are almost there! When booting the VM?s do you request hugepage memory for them (by setting hw:mem_page_size=large in flavor extra_spec)? If not then please do, if yes then please look into libvirt logfiles for the VM?s (in /var/log/libvirt/qemu/instance-xxx), I think there could be a clue. Regards Przemek From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] Sent: Monday, January 25, 2016 6:13 PM To: Czesnowicz, Przemyslaw Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue Hi Przemek, Thank you for your response, It really provided us breakthrough. After setting up DPDK on compute node for stable/kilo, We are trying to set up Openstack stable/liberty all-in-one setup, At present we are not able to get the IP allocation for the vhost type instances through DHCP. Also we tried assigning IP's manually to them but the inter-VM communication also not happening, #neutron agent-list root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list +--++---+---++---+ | id | agent_type | host | alive | admin_state_up | binary| +--++---+---++---+ | 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent | nfv-dpdk-devstack | :-) | True | neutron-openvswitch-agent | | 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent | nfv-dpdk-devstack | :-) | True | neutron-l3-agent | | 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent | nfv-dpdk-devstack | :-) | True | neutron-dhcp-agent| | b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | nfv-dpdk-devstack | :-) | True | neutron-linuxbridge-agent | | c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent | nfv-dpdk-devstack | :-) | True | neutron-metadata-agent| | f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | nfv-dpdk-devstack | xxx | True | neutron-openvswitch-agent | +--++---+---++---+ ovs-vsctl show output# Bridge br-dpdk Port br-dpdk Interface br-dpdk type: internal Port phy-br-dpdk Interface phy-br-dpdk type: patch options: {peer=int-br-dpdk} Bridge br-int fail_mode: secure Port "vhufa41e799-f2" tag: 5 Interface "vhufa41e799-f2" type: dpdkvhostuser Port int-br-dpdk Interface int-br-dpdk type: patch options: {peer=phy-br-dpdk} Port "tap4e19f8e1-59" tag: 5 Interface "tap4e19f8e1-59" type: internal Port "vhu05734c49-3b" tag: 5 Interface "vhu05734c49-3b" type: dpdkvhostuser Port "vhu10c06b4d-84" tag: 5 Interface "vhu10c06b4d-84" type: dpdkvhostuser Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "vhue169c581-ef" tag: 5 Interface "vhue169c581-ef" type: dpdkvhostuser P
[dpdk-dev] [PATCH 0/3] Use common Linux tools to control DPDK ports
This work is to make DPDK ports more visible and to enable using common Linux tools to configure DPDK ports. Patch is based on KNI but contains only control functionality of it, also this patch does not include any Linux kernel network driver as part of it. Basically with the help of a kernel module (KCP), virtual Linux network interfaces named as "dpdk$" are created per DPDK port, control messages sent to these virtual interfaces are forwarded to DPDK, and response sent back to Linux application. Virtual interfaces created when DPDK application started and destroyed automatically when DPDK application terminated. Communication between kernel-space and DPDK done using netlink socket. Currently implementation is not complete, sample support added for the RFC, more functionality can be added based on community response. With this RFC Patch, supported: get/set mac address/mtu of DPDK devices, getting stats from DPDK devices and some set of ethtool commands. In long term this patch intends to replace the KNI and KNI will be depreciated. Samples: $ ifconfig dpdk0: flags=4099 mtu 1500 ether 90:e2:ba:0e:49:b8 txqueuelen 1000 (Ethernet) RX packets 33 bytes 2058 (2.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 33 bytes 2058 (2.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 dpdk1: flags=4099 mtu 1500 ether 00:1b:21:76:fa:21 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 After some traffic on port 0: $ ifconfig dpdk0: flags=4099 mtu 1500 ether 90:e2:ba:0e:49:77 txqueuelen 1000 (Ethernet) RX packets 962 bytes 57798 (56.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 962 bytes 57798 (56.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ ethtool -i dpdk0 driver: rte_ixgbe_pmd version: RTE 2.3.0-rc0 firmware-version: expansion-rom-version: bus-info: :08:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no $ ip l show dpdk0 25: dpdk0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:0e:49:b8 brd ff:ff:ff:ff:ff:ff $ ip l set dpdk0 addr 90:e2:ba:0e:49:77 $ ip l show dpdk0 25: dpdk0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:0e:49:77 brd ff:ff:ff:ff:ff:ff Ferruh Yigit (3): kcp: add kernel control path kernel module rte_ctrl_if: add control interface library examples/ethtool: add control interface support to the application config/common_linuxapp | 9 +- doc/api/doxy-api-index.md | 3 +- doc/api/doxy-api.conf | 1 + doc/guides/rel_notes/release_2_3.rst | 9 + doc/guides/sample_app_ug/ethtool.rst | 41 +++ examples/ethtool/ethtool-app/main.c| 10 +- lib/Makefile | 3 +- lib/librte_ctrl_if/Makefile| 58 lib/librte_ctrl_if/rte_ctrl_if.c | 162 ++ lib/librte_ctrl_if/rte_ctrl_if.h | 115 +++ lib/librte_ctrl_if/rte_ctrl_if_version.map | 9 + lib/librte_ctrl_if/rte_ethtool.c | 354 + lib/librte_ctrl_if/rte_ethtool.h | 54 lib/librte_ctrl_if/rte_nl.c| 259 +++ lib/librte_ctrl_if/rte_nl.h| 60 lib/librte_eal/common/include/rte_log.h| 3 +- lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kcp_common.h | 86 + lib/librte_eal/linuxapp/kcp/Makefile | 58 lib/librte_eal/linuxapp/kcp/kcp_dev.h | 65 lib/librte_eal/linuxapp/kcp/kcp_ethtool.c | 261 +++ lib/librte_eal/linuxapp/kcp/kcp_misc.c | 282 lib/librte_eal/linuxapp/kcp/kcp_net.c | 209 lib/librte_eal/linuxapp/kcp/kcp_nl.c | 194 +++ mk/rte.app.mk | 3 +- 26 files changed, 2307 insertions(+), 9 deletions(-) create mode 100644 lib/librte_ctrl_if/Makefile create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.c create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.h create mode 100644 lib/librte_ctrl_if/rte_ctrl_if_version.map create mode 100644 lib/librte_ctrl_if/rte_ethtool.c create mode 100644 lib/librte_ctrl_if/rte_ethtool.h create mode 100644 lib/librte_ctrl_if/rte_nl.c create mode 100644 lib/librte_ctrl_if/rte_nl.h create mode 100644 lib/librte_eal/linuxapp/eal
[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module
This kernel module is based on KNI module, but this one is stripped version of it and only for control messages, no data transfer functionality provided. This Linux kernel module helps userspace application create virtual interfaces and when a control command issued into that virtual interface, module pushes the command to the userspace and gets the response back for the caller application. The Linux tools like ethtool/ifconfig/ip can be used on virtual interfaces but not ones for related data, like tcpdump. In long term this patch intends to replace the KNI and KNI will be depreciated. Signed-off-by: Ferruh Yigit --- config/common_linuxapp | 6 + lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kcp_common.h | 86 +++ lib/librte_eal/linuxapp/kcp/Makefile | 58 + lib/librte_eal/linuxapp/kcp/kcp_dev.h | 65 + lib/librte_eal/linuxapp/kcp/kcp_ethtool.c | 261 +++ lib/librte_eal/linuxapp/kcp/kcp_misc.c | 282 + lib/librte_eal/linuxapp/kcp/kcp_net.c | 209 +++ lib/librte_eal/linuxapp/kcp/kcp_nl.c | 194 ++ 10 files changed, 1167 insertions(+), 2 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h create mode 100644 lib/librte_eal/linuxapp/kcp/Makefile create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_dev.h create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_ethtool.c create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_misc.c create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_net.c create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_nl.c diff --git a/config/common_linuxapp b/config/common_linuxapp index 74bc515..5d5e3e4 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -503,6 +503,12 @@ CONFIG_RTE_KNI_VHOST_DEBUG_RX=n CONFIG_RTE_KNI_VHOST_DEBUG_TX=n # +# Compile librte_ctrl_if +# +CONFIG_RTE_KCP_KMOD=y +CONFIG_RTE_KCP_KO_DEBUG=n + +# # Compile vhost library # fuse-devel is needed to run vhost-cuse. # fuse-devel enables user space char driver development diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile index d9c5233..d1fa3a3 100644 --- a/lib/librte_eal/linuxapp/Makefile +++ b/lib/librte_eal/linuxapp/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -38,6 +38,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal ifeq ($(CONFIG_RTE_KNI_KMOD),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni endif +ifeq ($(CONFIG_RTE_KCP_KMOD),y) +DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kcp +endif ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0 endif diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 26eced5..dded8cb 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -116,6 +116,7 @@ CFLAGS_eal_thread.o += -Wno-return-type endif INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h +INC += rte_kcp_common.h SYMLINK-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP)-include/exec-env := \ $(addprefix include/exec-env/,$(INC)) diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h new file mode 100644 index 000..b3a6ee3 --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h @@ -0,0 +1,86 @@ +/*- + * This file is provided under a dual BSD/LGPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GNU LESSER GENERAL PUBLIC LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + * + * + *
[dpdk-dev] [PATCH 2/3] rte_ctrl_if: add control interface library
This library gets control messages form kernelspace and forwards them to librte_ether and returns response back to the kernelspace. Library does: 1) Trigger Linux virtual interface creation 2) Initialize the netlink socket communication 3) Provides process() API to the application that does processing the received messages This library requires corresponding kernel module to be inserted. Signed-off-by: Ferruh Yigit --- config/common_linuxapp | 3 +- doc/api/doxy-api-index.md | 3 +- doc/api/doxy-api.conf | 1 + doc/guides/rel_notes/release_2_3.rst | 9 + lib/Makefile | 3 +- lib/librte_ctrl_if/Makefile| 58 + lib/librte_ctrl_if/rte_ctrl_if.c | 162 + lib/librte_ctrl_if/rte_ctrl_if.h | 115 ++ lib/librte_ctrl_if/rte_ctrl_if_version.map | 9 + lib/librte_ctrl_if/rte_ethtool.c | 354 + lib/librte_ctrl_if/rte_ethtool.h | 54 + lib/librte_ctrl_if/rte_nl.c| 259 + lib/librte_ctrl_if/rte_nl.h| 60 + lib/librte_eal/common/include/rte_log.h| 3 +- mk/rte.app.mk | 3 +- 15 files changed, 1091 insertions(+), 5 deletions(-) create mode 100644 lib/librte_ctrl_if/Makefile create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.c create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.h create mode 100644 lib/librte_ctrl_if/rte_ctrl_if_version.map create mode 100644 lib/librte_ctrl_if/rte_ethtool.c create mode 100644 lib/librte_ctrl_if/rte_ethtool.h create mode 100644 lib/librte_ctrl_if/rte_nl.c create mode 100644 lib/librte_ctrl_if/rte_nl.h diff --git a/config/common_linuxapp b/config/common_linuxapp index 5d5e3e4..f72ba0e 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -507,6 +507,7 @@ CONFIG_RTE_KNI_VHOST_DEBUG_TX=n # CONFIG_RTE_KCP_KMOD=y CONFIG_RTE_KCP_KO_DEBUG=n +CONFIG_RTE_LIBRTE_CTRL_IF=y # # Compile vhost library diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 7a91001..214d16e 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -149,4 +149,5 @@ There are many libraries, so their headers may be grouped by topics: [common] (@ref rte_common.h), [ABI compat] (@ref rte_compat.h), [keepalive] (@ref rte_keepalive.h), - [version](@ref rte_version.h) + [version](@ref rte_version.h), + [control interface] (@ref rte_ctrl_if.h) diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf index 57e8b5d..fd69bf1 100644 --- a/doc/api/doxy-api.conf +++ b/doc/api/doxy-api.conf @@ -39,6 +39,7 @@ INPUT = doc/api/doxy-api-index.md \ lib/librte_cmdline \ lib/librte_compat \ lib/librte_cryptodev \ + lib/librte_ctrl_if \ lib/librte_distributor \ lib/librte_ether \ lib/librte_hash \ diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..39725e4 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,14 @@ DPDK Release 2.3 New Features +* **Control interface support added.** + + To enable controlling DPDK ports by common Linux tools. + Following modules added to DPDK: + + * librte_ctrl_if library + * librte_eal/linuxapp/kcp kernel module + Resolved Issues --- @@ -51,6 +59,7 @@ The libraries prepended with a plus sign were incremented in this version. librte_acl.so.2 librte_cfgfile.so.2 librte_cmdline.so.1 + + librte_ctrl_if.so.1 librte_distributor.so.1 librte_eal.so.2 librte_hash.so.2 diff --git a/lib/Makefile b/lib/Makefile index ef172ea..a50bc1e 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder +DIRS-$(CONFIG_RTE_LIBRTE_CTRL_IF) += librte_ctrl_if ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git a/lib/librte_ctrl_if/Makefile
[dpdk-dev] [PATCH 3/3] examples/ethtool: add control interface support to the application
Control interface APIs added into the sample application. To have the support corresponding kernel module (KCP) needs to be inserted. If kernel module is not there, application will run as it is without kernel control path support. When KCP module inserted, running application creates a virtual Linux network interface (dpdk$) per DPDK port. This interface can be used by traditional Linux tools. Signed-off-by: Ferruh Yigit --- doc/guides/sample_app_ug/ethtool.rst | 41 examples/ethtool/ethtool-app/main.c | 10 +++-- 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/doc/guides/sample_app_ug/ethtool.rst b/doc/guides/sample_app_ug/ethtool.rst index 4d1697e..2174288 100644 --- a/doc/guides/sample_app_ug/ethtool.rst +++ b/doc/guides/sample_app_ug/ethtool.rst @@ -131,6 +131,47 @@ application`_. Individual call-back functions handle the detail associated with each command, which make use of the functions defined in the `Ethtool interface`_ to the DPDK functions. +Control Interface +~ + +If Kernel Control Path (KCP) kernel module (rte_kcp.ko) inserted, +virtual interfaces created for each DPDK port for control purposes. + +Created interfaces are named as dpdk#, like: + +.. code-block:: console + +# ifconfig dpdk0; ifconfig dpdk1 +dpdk0: flags=4099 mtu 1500 +ether 90:e2:ba:0e:49:b9 txqueuelen 1000 (Ethernet) +RX packets 0 bytes 0 (0.0 B) +RX errors 0 dropped 0 overruns 0 frame 0 +TX packets 0 bytes 0 (0.0 B) +TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + +dpdk1: flags=4099 mtu 1500 +ether 00:1b:21:76:fa:21 txqueuelen 1000 (Ethernet) +RX packets 0 bytes 0 (0.0 B) +RX errors 0 dropped 0 overruns 0 frame 0 +TX packets 0 bytes 0 (0.0 B) +TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + +Regular Linux commands can be issued on interfaces: + +.. code-block:: console + +# ethtool -i dpdk0 +driver: rte_ixgbe_pmd +version: RTE 2.3.0-rc0 +firmware-version: +expansion-rom-version: +bus-info: :08:00.1 +supports-statistics: yes +supports-test: no +supports-eeprom-access: yes +supports-register-dump: yes +supports-priv-flags: no + Ethtool interface - diff --git a/examples/ethtool/ethtool-app/main.c b/examples/ethtool/ethtool-app/main.c index e21abcd..68b13ad 100644 --- a/examples/ethtool/ethtool-app/main.c +++ b/examples/ethtool/ethtool-app/main.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2015 Intel Corporation. All rights reserved. + * Copyright(c) 2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -44,6 +44,7 @@ #include #include #include +#include #include "ethapp.h" @@ -54,7 +55,6 @@ #define PKTPOOL_EXTRA_SIZE 512 #define PKTPOOL_CACHE 32 - struct txq_port { uint16_t cnt_unsent; struct rte_mbuf *buf_frames[MAX_BURST_LENGTH]; @@ -254,6 +254,8 @@ static int slave_main(__attribute__((unused)) void *ptr_data) } rte_spinlock_unlock(&ptr_port->lock); } /* end for( idx_port ) */ + rte_eth_control_interface_process_msg( + RTE_ETHTOOL_CTRL_IF_PROCESS_MSG, 0); } /* end for(;;) */ return 0; @@ -293,6 +295,8 @@ int main(int argc, char **argv) id_core = rte_get_next_lcore(id_core, 1, 1); rte_eal_remote_launch(slave_main, NULL, id_core); + rte_eth_control_interface_create(); + ethapp_main(); app_cfg.exit_now = 1; @@ -301,5 +305,7 @@ int main(int argc, char **argv) return -1; } + rte_eth_control_interface_destroy(); + return 0; } -- 2.5.0
[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode
Hi Thomas, > > Is it possible (is it better) to declare these functions with > > vfio_dma_func_t? > > Yeah, sure. Or maybe the other way around - maybe we could do away with > the typedef. I'll go for the former though. No, we can't declare the functions with a function pointer. At least I don't see any obvious way to do that without incurring multiple declarations compile error. So I'll leave it as forward declarations. Of course, the other alternative is to put the array below the functions and make them static, to avoid forward declarations, but I think it's much clearer the way it is now. Thanks, Anatoly
[dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux
This is slow data path communication implementation based on existing KNI. Difference is: librte_kni converted into a PMD, kdp kernel module is almost same except all control path functionality removed and some simplification done. Motivation is to simplify slow path data communication. Now any application can use this new PMD to send/get data to Linux kernel. PMD supports two communication methods: 1) KDP kernel module PMD initialization functions handles creating virtual interfaces (with help of kdp kernel module) and created FIFO. FIFO is used to share data between userspace and kernelspace. This is default method. 2) tun/tap module When KDP module is not inserted, PMD creates tap interface and transfers packets using tap interface. In long term this patch intends to replace the KNI and KNI will be depreciated. Sample usage: 1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) bind NIC to the DPDK using dpdk_nic_bind.py c) ./testpmd --vdev eth_kdp0 c1) testpmd show two ports, one of them physical, other virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) ... Checking link statuses... Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed 1 Mbps - full-duplex Done c2) This will create "kdp0" Linux interface $ ip l show kdp0 21: kdp0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Linux port can be used for data d1) $ ifconfig kdp0 1.0.0.2 $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, length 64 15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 64 2) Data travels between virtual Linux interfaces pass from DPDK application, application can alter data a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) No physical NIC involved c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 c1) testpmd show two ports, both of them are virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) Port 1: 00:00:00:00:00:00 Checking link statuses... Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed 1 Mbps - full-duplex Done c2) This will create "kdp0" and "kdp1" Linux interfaces $ ip l show kdp0; ip l show kdp1 22: kdp0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 23: kdp1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Data travel between virtual ports pass from DPDK application $ifconfig kdp0 1.0.0.1 $ifconfig kdp1 1.0.0.2 d1) $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms 64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, length 64 15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 64 15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, length 64 15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 64 3) tun/tap interface usage a) No external module required, tun/tap support in kernel required b) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 b1) This will create "tap_kdp0" and "tap_kdp1" Linux interfaces $ ip l show tap_kdp0; ip l show tap_kdp1 25: tap_kdp0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 56:47:97:9c:03:8e brd ff:ff:ff:ff:ff:ff 26: tap_kdp1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:15:22:b0:52:42 brd ff:ff:ff:ff:ff:ff Ferruh Yigit (2): kdp: add kernel data path kernel module kdp: add virtual PMD for kernel slow data path communication config/common_linuxapp | 9 +- doc/guides/nics/pcap_ring.rst | 125 - doc/guides/rel_notes/release_2_3.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 +++ drivers/net/kdp/rte_eth_kdp.c | 481 + drivers/net/kdp/rte_kdp.c | 365 + drivers/net/kd