[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue

2016-01-27 Thread Abhijeet Karve
Hi Przemek,

Thank you for your response, It's really provided us breakthrough. 

After setting up DPDK on compute node for stable/kilo, Trying to set up 
Openstack stable/liberty all-in-one setup, At present not able to get the IP 
allocation for the vhost type instances through DHCP. Also tried assigning IP's 
manually to them but the inter-VM communication also not happening,

#neutron agent-list
root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list
+--++---+---++---+
| id ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | agent_type ? ? ? ? | host ? ? ? ? ? ? 
?| alive | admin_state_up | binary ? ? ? ? ? ? ? ? ? ?|
+--++---+---++---+
| 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent ? ? | nfv-dpdk-devstack 
| :-) ? | True ? ? ? ? ? | neutron-openvswitch-agent |
| 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent ? ? ? ? ? | nfv-dpdk-devstack 
| :-) ? | True ? ? ? ? ? | neutron-l3-agent ? ? ? ? ?|
| 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent ? ? ? ? | nfv-dpdk-devstack 
| :-) ? | True ? ? ? ? ? | neutron-dhcp-agent ? ? ? ?|
| b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | nfv-dpdk-devstack 
| :-) ? | True ? ? ? ? ? | neutron-linuxbridge-agent |
| c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent ? ? | nfv-dpdk-devstack 
| :-) ? | True ? ? ? ? ? | neutron-metadata-agent ? ?|
| f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | nfv-dpdk-devstack 
| xxx ? | True ? ? ? ? ? | neutron-openvswitch-agent |
+--++---+---++---+


ovs-vsctl show output#

Bridge br-dpdk
? ? ? ? Port br-dpdk
? ? ? ? ? ? Interface br-dpdk
? ? ? ? ? ? ? ? type: internal
? ? ? ? Port phy-br-dpdk
? ? ? ? ? ? Interface phy-br-dpdk
? ? ? ? ? ? ? ? type: patch
? ? ? ? ? ? ? ? options: {peer=int-br-dpdk}
? ? Bridge br-int
? ? ? ? fail_mode: secure
? ? ? ? Port "vhufa41e799-f2"
? ? ? ? ? ? tag: 5
? ? ? ? ? ? Interface "vhufa41e799-f2"
? ? ? ? ? ? ? ? type: dpdkvhostuser
? ? ? ? Port int-br-dpdk
? ? ? ? ? ? Interface int-br-dpdk
? ? ? ? ? ? ? ? type: patch
? ? ? ? ? ? ? ? options: {peer=phy-br-dpdk}
? ? ? ? Port "tap4e19f8e1-59"
? ? ? ? ? ? tag: 5
? ? ? ? ? ? Interface "tap4e19f8e1-59"
? ? ? ? ? ? ? ? type: internal
? ? ? ? Port "vhu05734c49-3b"
? ? ? ? ? ? tag: 5
? ? ? ? ? ? Interface "vhu05734c49-3b"
? ? ? ? ? ? ? ? type: dpdkvhostuser
? ? ? ? Port "vhu10c06b4d-84"
? ? ? ? ? ? tag: 5
? ? ? ? ? ? Interface "vhu10c06b4d-84"
? ? ? ? ? ? ? ? type: dpdkvhostuser
? ? ? ? Port patch-tun
? ? ? ? ? ? Interface patch-tun
? ? ? ? ? ? ? ? type: patch
? ? ? ? ? ? ? ? options: {peer=patch-int}
? ? ? ? Port "vhue169c581-ef"
? ? ? ? ? ? tag: 5
? ? ? ? ? ? Interface "vhue169c581-ef"
? ? ? ? ? ? ? ? type: dpdkvhostuser
? ? ? ? Port br-int
? ? ? ? ? ? Interface br-int
? ? ? ? ? ? ? ? type: internal
? ? Bridge br-tun
? ? ? ? fail_mode: secure
? ? ? ? Port br-tun
? ? ? ? ? ? Interface br-tun
? ? ? ? ? ? ? ? type: internal
? ? ? ? ? ? ? ? error: "could not open network device br-tun (Invalid argument)"
? ? ? ? Port patch-int
? ? ? ? ? ? Interface patch-int
? ? ? ? ? ? ? ? type: patch
? ? ? ? ? ? ? ? options: {peer=patch-tun}
? ? ovs_version: "2.4.0"



ovs-ofctl dump-flows br-int#

root at nfv-dpdk-devstack:/etc/neutron# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
?cookie=0xaaa002bb2bcf827b, duration=2410.012s, table=0, n_packets=0, 
n_bytes=0, idle_age=2410, priority=10,icmp6,in_port=43,icmp_type=136 
actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2409.480s, table=0, n_packets=0, 
n_bytes=0, idle_age=2409, priority=10,icmp6,in_port=44,icmp_type=136 
actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2408.704s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=45,icmp_type=136 
actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2408.155s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=42,icmp_type=136 
actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2409.858s, table=0, n_packets=0, 
n_bytes=0, idle_age=2409, priority=10,arp,in_port=43 actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2409.314s, table=0, n_packets=0, 
n_bytes=0, idle_age=2409, priority=10,arp,in_port=44 actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2408.564s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,arp,in_port=45 actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2408.019s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,arp,in_port=42 actions=resubmit(,24)
?cookie=0xaaa002bb2bcf827b, duration=2411.53

[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request

2016-01-27 Thread Jingjing Wu
Interrupt based request of PF reset from PF is supported by
enabling the adminq event process in VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
  rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);

Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_2_3.rst |   1 +
 drivers/net/i40e/i40e_ethdev_vf.c| 274 +++
 lib/librte_ether/rte_ethdev.h|   1 +
 3 files changed, 246 insertions(+), 30 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..73d5f76 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,7 @@ DPDK Release 2.3
 New Features
 

+* **Added pf reset event reported in i40e vf PMD driver.

 Resolved Issues
 ---
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 64e6957..1ffe64e 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -74,8 +74,6 @@
 #define I40EVF_BUSY_WAIT_DELAY 10
 #define I40EVF_BUSY_WAIT_COUNT 50
 #define MAX_RESET_WAIT_CNT 20
-/*ITR index for NOITR*/
-#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3

 struct i40evf_arq_msg_info {
enum i40e_virtchnl_ops ops;
@@ -151,6 +149,9 @@ static int
 i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id);
 static int
 i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
+static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
+  uint8_t *msg,
+  uint16_t msglen);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct 
vf_cmd_info *args)
return err;
}

-   do {
-   /* Delay some time first */
-   rte_delay_ms(ASQ_DELAY_MS);
-   ret = i40evf_read_pfmsg(dev, &info);
-   if (ret == I40EVF_MSG_CMD) {
-   err = 0;
-   break;
-   } else if (ret == I40EVF_MSG_ERR) {
-   err = -1;
-   break;
-   }
-   /* If don't read msg or read sys event, continue */
-   } while (i++ < MAX_TRY_TIMES);
-   _clear_cmd(vf);
+   switch (args->ops) {
+   case I40E_VIRTCHNL_OP_RESET_VF:
+   /*no need to process in this function */
+   break;
+   case I40E_VIRTCHNL_OP_VERSION:
+   case I40E_VIRTCHNL_OP_GET_VF_RESOURCES:
+   /* for init adminq commands, need to poll the response */
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   ret = i40evf_read_pfmsg(dev, &info);
+   if (ret == I40EVF_MSG_CMD) {
+   err = 0;
+   break;
+   } else if (ret == I40EVF_MSG_ERR) {
+   err = -1;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   _clear_cmd(vf);
+   break;
+
+   default:
+   /* for other adminq in running time, waiting the cmd done flag 
*/
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) {
+   err = 0;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   break;
+   }

return (err | vf->cmd_retval);
 }
@@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev)

map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer;
map_info->num_vectors = 1;
-   map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR;
+   map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT;
map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id;
/* Alway use default dynamic MSIX interrupt */
map_info->vecmap[0].vector_id = vector_id;
@@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev 
*dev,
return 0;
 }

+/* Disable IRQ0 */
+static inline void
+i40evf_disable_irq0(struct i40e_hw *hw)
+{
+   /* Disable all interrupt types */
+   I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0);
+   I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01,
+  I40E_VFINT_DYN_CTL01_ITR_I

[dpdk-dev] [PATCH v2 1/2] i40evf: allocate virtchnl cmd buffer for each vf

2016-01-27 Thread Jingjing Wu
Currently, i40evf PMD uses a global static buffer to send virtchnl
command to host driver. It is shared by multi VFs.
This patch changed to allocate virtchnl cmd buffer for each VF.

Signed-off-by: Jingjing Wu 
---
 drivers/net/i40e/i40e_ethdev.h|   2 +
 drivers/net/i40e/i40e_ethdev_vf.c | 181 +++---
 2 files changed, 74 insertions(+), 109 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 1f9792b..93122ad 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -494,7 +494,9 @@ struct i40e_vf {
bool link_up;
bool vf_reset;
volatile uint32_t pend_cmd; /* pending command not finished yet */
+   uint32_t cmd_retval; /* return value of the cmd response from PF */
u16 pend_msg; /* flags indicates events from pf not handled yet */
+   uint8_t *aq_resp; /* buffer to store the adminq response from PF */

/* VSI info */
struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 14d2a50..64e6957 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -103,9 +103,6 @@ enum i40evf_aq_result {
I40EVF_MSG_CMD,  /* Read async command result */
 };

-/* A share buffer to store the command result from PF driver */
-static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ];
-
 static int i40evf_dev_configure(struct rte_eth_dev *dev);
 static int i40evf_dev_start(struct rte_eth_dev *dev);
 static void i40evf_dev_stop(struct rte_eth_dev *dev);
@@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw)
 }

 /*
- * Parse admin queue message.
- *
- * return value:
- *  < 0: meet error
- *  0: read sys msg
- *  > 0: read cmd result
+ * Read data in admin queue to get msg from pf driver
  */
 static enum i40evf_aq_result
-i40evf_parse_pfmsg(struct i40e_vf *vf,
-  struct i40e_arq_event_info *event,
-  struct i40evf_arq_msg_info *data)
+i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
 {
-   enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\
-   rte_le_to_cpu_32(event->desc.cookie_high);
-   enum i40e_status_code retval = (enum i40e_status_code)\
-   rte_le_to_cpu_32(event->desc.cookie_low);
-   enum i40evf_aq_result ret = I40EVF_MSG_CMD;
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   struct i40e_arq_event_info event;
+   enum i40e_virtchnl_ops opcode;
+   enum i40e_status_code retval;
+   int ret;
+   enum i40evf_aq_result result = I40EVF_MSG_NON;

+   event.buf_len = data->buf_len;
+   event.msg_buf = data->msg;
+   ret = i40e_clean_arq_element(hw, &event, NULL);
+   /* Can't read any msg from adminQ */
+   if (ret) {
+   if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK)
+   result = I40EVF_MSG_NON;
+   else
+   result = I40EVF_MSG_ERR;
+   return result;
+   }
+
+   opcode = (enum 
i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high);
+   retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low);
/* pf sys event */
if (opcode == I40E_VIRTCHNL_OP_EVENT) {
struct i40e_virtchnl_pf_event *vpe =
-   (struct i40e_virtchnl_pf_event *)event->msg_buf;
+   (struct i40e_virtchnl_pf_event *)event.msg_buf;

-   /* Initialize ret to sys event */
-   ret = I40EVF_MSG_SYS;
+   result = I40EVF_MSG_SYS;
switch (vpe->event) {
case I40E_VIRTCHNL_EVENT_LINK_CHANGE:
vf->link_up =
@@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf,
}
} else {
/* async reply msg on command issued by vf previously */
-   ret = I40EVF_MSG_CMD;
+   result = I40EVF_MSG_CMD;
/* Actual data length read from PF */
-   data->msg_len = event->msg_len;
+   data->msg_len = event.msg_len;
}
-   /* fill the ops and result to notify VF */
+
data->result = retval;
data->ops = opcode;

-   return ret;
-}
-
-/*
- * Read data in admin queue to get msg from pf driver
- */
-static enum i40evf_aq_result
-i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
-{
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   struct i40e_arq_event_info event;
-   int ret;
-   enum i40evf_aq_result result = I40EVF_MSG_NON;
-
-   event.buf_len = data->buf_len;
-   event.msg_buf = data->msg;
-   ret =

[dpdk-dev] [PATCH v2 0/2] i40evf: support interrupt based pf reset request

2016-01-27 Thread Jingjing Wu
v2 changes:
  remove the change on vf reset status checking
  add pf event report support in release note

If DPDK is used on VF while the host is using Linux Kernel
driver as PF driver on FVL NIC, some setting on PF will trigger
VF reset. DPDK VF need to know the event.
This patch set makes the interrupt based request of PF reset
from PF supported by enabling the adminq event process in
VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);


Jingjing Wu (2):
  i40evf: allocate virtchnl cmd buffer for each vf
  i40evf: support interrupt based pf reset request

 doc/guides/rel_notes/release_2_3.rst |   1 +
 drivers/net/i40e/i40e_ethdev.h   |   2 +
 drivers/net/i40e/i40e_ethdev_vf.c| 423 +--
 lib/librte_ether/rte_ethdev.h|   1 +
 4 files changed, 304 insertions(+), 123 deletions(-)

-- 
2.4.0



[dpdk-dev] [PATCH v2 1/2] i40evf: allocate virtchnl cmd buffer for each vf

2016-01-27 Thread Jingjing Wu
Currently, i40evf PMD uses a global static buffer to send virtchnl
command to host driver. It is shared by multi VFs.
This patch changed to allocate virtchnl cmd buffer for each VF.

Signed-off-by: Jingjing Wu 
---
 drivers/net/i40e/i40e_ethdev.h|   2 +
 drivers/net/i40e/i40e_ethdev_vf.c | 181 +++---
 2 files changed, 74 insertions(+), 109 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 1f9792b..93122ad 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -494,7 +494,9 @@ struct i40e_vf {
bool link_up;
bool vf_reset;
volatile uint32_t pend_cmd; /* pending command not finished yet */
+   uint32_t cmd_retval; /* return value of the cmd response from PF */
u16 pend_msg; /* flags indicates events from pf not handled yet */
+   uint8_t *aq_resp; /* buffer to store the adminq response from PF */

/* VSI info */
struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 14d2a50..64e6957 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -103,9 +103,6 @@ enum i40evf_aq_result {
I40EVF_MSG_CMD,  /* Read async command result */
 };

-/* A share buffer to store the command result from PF driver */
-static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ];
-
 static int i40evf_dev_configure(struct rte_eth_dev *dev);
 static int i40evf_dev_start(struct rte_eth_dev *dev);
 static void i40evf_dev_stop(struct rte_eth_dev *dev);
@@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw)
 }

 /*
- * Parse admin queue message.
- *
- * return value:
- *  < 0: meet error
- *  0: read sys msg
- *  > 0: read cmd result
+ * Read data in admin queue to get msg from pf driver
  */
 static enum i40evf_aq_result
-i40evf_parse_pfmsg(struct i40e_vf *vf,
-  struct i40e_arq_event_info *event,
-  struct i40evf_arq_msg_info *data)
+i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
 {
-   enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\
-   rte_le_to_cpu_32(event->desc.cookie_high);
-   enum i40e_status_code retval = (enum i40e_status_code)\
-   rte_le_to_cpu_32(event->desc.cookie_low);
-   enum i40evf_aq_result ret = I40EVF_MSG_CMD;
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   struct i40e_arq_event_info event;
+   enum i40e_virtchnl_ops opcode;
+   enum i40e_status_code retval;
+   int ret;
+   enum i40evf_aq_result result = I40EVF_MSG_NON;

+   event.buf_len = data->buf_len;
+   event.msg_buf = data->msg;
+   ret = i40e_clean_arq_element(hw, &event, NULL);
+   /* Can't read any msg from adminQ */
+   if (ret) {
+   if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK)
+   result = I40EVF_MSG_NON;
+   else
+   result = I40EVF_MSG_ERR;
+   return result;
+   }
+
+   opcode = (enum 
i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high);
+   retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low);
/* pf sys event */
if (opcode == I40E_VIRTCHNL_OP_EVENT) {
struct i40e_virtchnl_pf_event *vpe =
-   (struct i40e_virtchnl_pf_event *)event->msg_buf;
+   (struct i40e_virtchnl_pf_event *)event.msg_buf;

-   /* Initialize ret to sys event */
-   ret = I40EVF_MSG_SYS;
+   result = I40EVF_MSG_SYS;
switch (vpe->event) {
case I40E_VIRTCHNL_EVENT_LINK_CHANGE:
vf->link_up =
@@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf,
}
} else {
/* async reply msg on command issued by vf previously */
-   ret = I40EVF_MSG_CMD;
+   result = I40EVF_MSG_CMD;
/* Actual data length read from PF */
-   data->msg_len = event->msg_len;
+   data->msg_len = event.msg_len;
}
-   /* fill the ops and result to notify VF */
+
data->result = retval;
data->ops = opcode;

-   return ret;
-}
-
-/*
- * Read data in admin queue to get msg from pf driver
- */
-static enum i40evf_aq_result
-i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
-{
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   struct i40e_arq_event_info event;
-   int ret;
-   enum i40evf_aq_result result = I40EVF_MSG_NON;
-
-   event.buf_len = data->buf_len;
-   event.msg_buf = data->msg;
-   ret =

[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request

2016-01-27 Thread Jingjing Wu
Interrupt based request of PF reset from PF is supported by
enabling the adminq event process in VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
  rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);

Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_2_3.rst |   1 +
 drivers/net/i40e/i40e_ethdev_vf.c| 274 +++
 lib/librte_ether/rte_ethdev.h|   1 +
 3 files changed, 246 insertions(+), 30 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..73d5f76 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,7 @@ DPDK Release 2.3
 New Features
 

+* **Added pf reset event reported in i40e vf PMD driver.

 Resolved Issues
 ---
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 64e6957..1ffe64e 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -74,8 +74,6 @@
 #define I40EVF_BUSY_WAIT_DELAY 10
 #define I40EVF_BUSY_WAIT_COUNT 50
 #define MAX_RESET_WAIT_CNT 20
-/*ITR index for NOITR*/
-#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3

 struct i40evf_arq_msg_info {
enum i40e_virtchnl_ops ops;
@@ -151,6 +149,9 @@ static int
 i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id);
 static int
 i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
+static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
+  uint8_t *msg,
+  uint16_t msglen);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct 
vf_cmd_info *args)
return err;
}

-   do {
-   /* Delay some time first */
-   rte_delay_ms(ASQ_DELAY_MS);
-   ret = i40evf_read_pfmsg(dev, &info);
-   if (ret == I40EVF_MSG_CMD) {
-   err = 0;
-   break;
-   } else if (ret == I40EVF_MSG_ERR) {
-   err = -1;
-   break;
-   }
-   /* If don't read msg or read sys event, continue */
-   } while (i++ < MAX_TRY_TIMES);
-   _clear_cmd(vf);
+   switch (args->ops) {
+   case I40E_VIRTCHNL_OP_RESET_VF:
+   /*no need to process in this function */
+   break;
+   case I40E_VIRTCHNL_OP_VERSION:
+   case I40E_VIRTCHNL_OP_GET_VF_RESOURCES:
+   /* for init adminq commands, need to poll the response */
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   ret = i40evf_read_pfmsg(dev, &info);
+   if (ret == I40EVF_MSG_CMD) {
+   err = 0;
+   break;
+   } else if (ret == I40EVF_MSG_ERR) {
+   err = -1;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   _clear_cmd(vf);
+   break;
+
+   default:
+   /* for other adminq in running time, waiting the cmd done flag 
*/
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) {
+   err = 0;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   break;
+   }

return (err | vf->cmd_retval);
 }
@@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev)

map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer;
map_info->num_vectors = 1;
-   map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR;
+   map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT;
map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id;
/* Alway use default dynamic MSIX interrupt */
map_info->vecmap[0].vector_id = vector_id;
@@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev 
*dev,
return 0;
 }

+/* Disable IRQ0 */
+static inline void
+i40evf_disable_irq0(struct i40e_hw *hw)
+{
+   /* Disable all interrupt types */
+   I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0);
+   I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01,
+  I40E_VFINT_DYN_CTL01_ITR_I

[dpdk-dev] [PATCH v2 0/2] i40evf: support interrupt based pf reset request

2016-01-27 Thread Jingjing Wu
v2 changes:
  remove the change on vf reset status checking
  add pf event report support in release note

If DPDK is used on VF while the host is using Linux Kernel
driver as PF driver on FVL NIC, some setting on PF will trigger
VF reset. DPDK VF need to know the event.
This patch set makes the interrupt based request of PF reset
from PF supported by enabling the adminq event process in
VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);


Jingjing Wu (2):
  i40evf: allocate virtchnl cmd buffer for each vf
  i40evf: support interrupt based pf reset request

 doc/guides/rel_notes/release_2_3.rst |   1 +
 drivers/net/i40e/i40e_ethdev.h   |   2 +
 drivers/net/i40e/i40e_ethdev_vf.c| 423 +--
 lib/librte_ether/rte_ethdev.h|   1 +
 4 files changed, 304 insertions(+), 123 deletions(-)

-- 
2.4.0



[dpdk-dev] [PATCH 15/16] fm10k: use default mailbox message handler for pf

2016-01-27 Thread Wang, Xiao W

> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, January 27, 2016 4:17 AM
> To: Wang, Xiao W 
> Cc: Chen, Jing D ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 15/16] fm10k: use default mailbox message
> handler for pf
> 
> On Mon, Jan 25, 2016 at 02:31:05AM +, Wang, Xiao W wrote:
> > Hi Bruce,
> >
> > > -Original Message-
> > > From: Richardson, Bruce
> > > Sent: Saturday, January 23, 2016 5:32 AM
> > > To: Wang, Xiao W 
> > > Cc: Chen, Jing D ; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 15/16] fm10k: use default mailbox
> > > message handler for pf
> > >
> > > On Thu, Jan 21, 2016 at 06:36:00PM +0800, Wang Xiao W wrote:
> > > > The new share code makes fm10k_msg_update_pvid_pf function static,
> > > > so we can not refer to it now in fm10k_ethdev.c. The registered pf
> > > > handler is almost the same as the default pf handler, removing it
> > > > has no
> > > impact on mailbox.
> > > >
> > > > Signed-off-by: Wang Xiao W 
> > >
> > > What patch makes the function static, as we need to ensure that the
> > > build is not broken by having this patch in the wrong place in the 
> > > patchset?
> > >
> > > Also, it seems strange having this patch in the middle of a series
> > > of base code updates - perhaps it should go first, so that all base
> > > code update patches can go one after the other.
> > >
> > > /Bruce
> >
> > It's the first patch in the patch set that makes the function static.
> 
> So does this patch not need to go before patch 1, if we can't refer to the
> function once patch one is applied?
> 
> /Bruce

OK, got it, I will revise my patch, thanks a lot for your comment.

Best Regards,
Wang, Xiao


[dpdk-dev] [PATCH v5 01/11] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2016-01-27 Thread Santosh Shukla
Ping?
On Jan 19, 2016 5:16 PM, "Santosh Shukla"  wrote:

> - virtio_recv_pkts_vec and other virtio vector friend apis are written for
>   sse/avx instructions. For arm64 in particular, virtio vector
> implementation
>   does not exist(todo).
>
> So virtio pmd driver wont build for targets like i686, arm64.  By making
> RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will
> work
> in non-vectored virtio mode.
>
> Disabling RTE_VIRTIO_INC_VECTOR config for :
>
> - i686 arch as i686 target config says:
>   config/defconfig_i686-native-linuxapp-gcc says "Vectorized PMD is not
>   supported on 32-bit".
>
> - armv7/v8 arch.
>
> Signed-off-by: Santosh Shukla 
> ---
> v4--> v5:
> - squashed v4's RTE_VIRTIO_INC_VECTOR patches into one patch.
> - Added ifdefs RTE_xx_xx_INC_VECTOR across _simple_rx_tx flag occurance in
> code.
>
>
>  config/common_linuxapp |1 +
>  config/defconfig_arm-armv7a-linuxapp-gcc   |4 +++-
>  config/defconfig_arm64-armv8a-linuxapp-gcc |4 +++-
>  config/defconfig_i686-native-linuxapp-gcc  |1 +
>  config/defconfig_i686-native-linuxapp-icc  |1 +
>  drivers/net/virtio/Makefile|2 +-
>  drivers/net/virtio/virtio_rxtx.c   |   16 +++-
>  drivers/net/virtio/virtio_rxtx.h   |2 ++
>  8 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..8677697 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -274,6 +274,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=y
>
>  #
>  # Compile burst-oriented VMXNET3 PMD driver
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc
> b/config/defconfig_arm-armv7a-linuxapp-gcc
> index cbebd64..9f852ce 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -43,6 +43,9 @@ CONFIG_RTE_FORCE_INTRINSICS=y
>  CONFIG_RTE_TOOLCHAIN="gcc"
>  CONFIG_RTE_TOOLCHAIN_GCC=y
>
> +# Disable VIRTIO VECTOR support
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> +
>  # ARM doesn't have support for vmware TSC map
>  CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
>
> @@ -70,7 +73,6 @@ CONFIG_RTE_LIBRTE_I40E_PMD=n
>  CONFIG_RTE_LIBRTE_IXGBE_PMD=n
>  CONFIG_RTE_LIBRTE_MLX4_PMD=n
>  CONFIG_RTE_LIBRTE_MPIPE_PMD=n
> -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
>  CONFIG_RTE_LIBRTE_VMXNET3_PMD=n
>  CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
>  CONFIG_RTE_LIBRTE_PMD_BNX2X=n
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc
> b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index 504f3ed..1a638b3 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -45,8 +45,10 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
>
>  CONFIG_RTE_CACHE_LINE_SIZE=64
>
> +# Disable VIRTIO VECTOR support
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> +
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
>  CONFIG_RTE_LIBRTE_IVSHMEM=n
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> diff --git a/config/defconfig_i686-native-linuxapp-gcc
> b/config/defconfig_i686-native-linuxapp-gcc
> index a90de9b..a4b1c49 100644
> --- a/config/defconfig_i686-native-linuxapp-gcc
> +++ b/config/defconfig_i686-native-linuxapp-gcc
> @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
>  # Vectorized PMD is not supported on 32-bit
>  #
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> diff --git a/config/defconfig_i686-native-linuxapp-icc
> b/config/defconfig_i686-native-linuxapp-icc
> index c021321..f8eb6ad 100644
> --- a/config/defconfig_i686-native-linuxapp-icc
> +++ b/config/defconfig_i686-native-linuxapp-icc
> @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
>  # Vectorized PMD is not supported on 32-bit
>  #
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> index 43835ba..25a842d 100644
> --- a/drivers/net/virtio/Makefile
> +++ b/drivers/net/virtio/Makefile
> @@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> -SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> +SRCS-$(CONFIG_RTE_VIRTIO_INC_VECTOR) += virtio_rxtx_simple.c
>
>  # this lib depends upon:
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
> diff --git a/drivers/net/virtio/virtio_rxtx.c
> b/drivers/net/virtio/virtio_rxtx.c
> index 41a1366..d8169d1 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -67,7 +67,9 @@
>  #define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
> ETH_TXQ_FLAGS_NOOFFLOADS)
>
> +#ifdef RTE_VIRTIO_INC_VECTOR
>  static int use_simple_rx

[dpdk-dev] [PATCH v5 03/11] linuxapp/vfio: ignore mapping for ioport region

2016-01-27 Thread Santosh Shukla
Ping.
On Jan 19, 2016 5:16 PM, "Santosh Shukla"  wrote:

> vfio_pci_mmap() try to map all pci bars. ioport region are not mapped in
> vfio/kernel so ignore mmaping for ioport.
>
> Signed-off-by: Santosh Shukla 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |   20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 74f91ba..abde779 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -573,6 +573,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
> struct pci_map *maps;
> uint32_t msix_table_offset = 0;
> uint32_t msix_table_size = 0;
> +   uint32_t ioport_bar;
>
> dev->intr_handle.fd = -1;
> dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> @@ -760,6 +761,25 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
> return -1;
> }
>
> +   /* chk for io port region */
> +   ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
> +
>  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
> + + PCI_BASE_ADDRESS_0 + i*4);
> +
> +   if (ret != sizeof(ioport_bar)) {
> +   RTE_LOG(ERR, EAL,
> +   "Cannot read command (%x) from config
> space!\n",
> +   PCI_BASE_ADDRESS_0 + i*4);
> +   return -1;
> +   }
> +
> +   if (ioport_bar & PCI_BASE_ADDRESS_SPACE_IO) {
> +   RTE_LOG(INFO, EAL,
> +   "Ignore mapping IO port bar(%d) addr:
> %x\n",
> +i, ioport_bar);
> +   continue;
> +   }
> +
> /* skip non-mmapable BARs */
> if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
> continue;
> --
> 1.7.9.5
>
>


[dpdk-dev] [PATCH v5 04/11] virtio_pci.h: build fix for sys/io.h for non-x86 arch

2016-01-27 Thread Santosh Shukla
Ping
On Jan 19, 2016 5:16 PM, "Santosh Shukla"  wrote:

> make sure sys/io.h used only for x86 archs. This fixes build error
> arm64/arm case.
>
> Signed-off-by: Santosh Shukla 
> ---
>  drivers/net/virtio/virtio_pci.h |2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/virtio/virtio_pci.h
> b/drivers/net/virtio/virtio_pci.h
> index 99572a0..f550d22 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -40,8 +40,10 @@
>  #include 
>  #include 
>  #else
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
>  #include 
>  #endif
> +#endif
>
>  #include 
>
> --
> 1.7.9.5
>
>


[dpdk-dev] [PATCH v5 01/11] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2016-01-27 Thread Yuanhan Liu
On Wed, Jan 27, 2016 at 07:53:21AM +0530, Santosh Shukla wrote:
> Ping?

I was on vacation late last week. And I was quite busy till now after
the vacation. So, sorry that I still don't have time to do more detailed
reviews in 1 or 2 days. Hopefully I can make it by this Friday.

BTW, I had a very glimpse of this patchset, overall, it looks much
better now, except the EAL changes (I'm not the maintainer) and the
virtio io port read/write stuff: Tetsuay suggested to add another
access wraps, but I have few concerns about that. Anyway, I don't
have time for deeper thoughts, and I will re-think it later.

--yliu


[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Xie, Huawei
On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> + } else {
> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> + }

Thanks!
We might simplify this further. Just reset the first two fields flags
and gso_type.


[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-27 Thread Qiu, Michael
On 1/11/2016 2:43 AM, Tan, Jianfeng wrote:
> Add a new virtual device named eth_cvio, it can be used just like
> eth_ring, eth_null, etc.
>
> Configured parameters include:
> - rx (optional, 1 by default): number of rx, only allowed to be
>  1 for now.
> - tx (optional, 1 by default): number of tx, only allowed to be
>  1 for now.


>From APP side, virtio is something HW, in your implementation rx/tx is
max queue numbers virtio supported. Does it make sense?

Why need user tell HW, how much queues it support? We'd better make it
un-configurable, only let users query it like the real HW, and then
decide how much queues it need to enable.


> - cq (optional, 0 by default): if ctrl queue is enabled, not
>  supported for now.
> - mac (optional): mac address, random value will be given if not
> specified.
> - queue_num (optional, 256 by default): size of virtqueue.

Better change it to queue_size.

Thanks,
Michael

> - path (madatory): path of vhost, depends on the file type:
>  vhost-user is used if the given path points to
>  a unix socket; vhost-net is used if the given
>  path points to a char device.
>
> The major difference with original virtio for vm is that, here we
> use virtual address instead of physical address for vhost to
> calculate relative address.
>
> When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled
> library can be used in both VM and container environment.
>
> Examples:
> a. Use vhost-net as a backend
> sudo numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \
> -m 1024 --no-pci --single-file --file-prefix=l2fwd \
> --vdev=eth_cvio0,mac=00:01:02:03:04:05,path=/dev/vhost-net \
> -- -p 0x1
>
> b. Use vhost-user as a backend
> numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 -m 1024 \
> --no-pci --single-file --file-prefix=l2fwd \
> --vdev=eth_cvio0,mac=00:01:02:03:04:05,path= \
> -- -p 0x1
>
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> ---
>



[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Yuanhan Liu
On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote:
> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> > +   if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> > +   *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> > +   } else {
> > +   *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> > +   }
> 
> Thanks!
> We might simplify this further. Just reset the first two fields flags
> and gso_type.

What's this "simplification" for? Don't even to say that we will add
TSO support, which modifies few more files, such as csum_start: reseting
the first two fields only is wrong here.

--yliu


[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst

2016-01-27 Thread Yuanhan Liu
On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote:
> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> > Signed-off-by: Yuanhan Liu 
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 287 
> > +-
> >  1 file changed, 113 insertions(+), 174 deletions(-)
> 
> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us

I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not
to do that, unless it has a good reason.

> processing descriptors in a burst way in future.

So, do you have a plan?

--yliu


[dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx

2016-01-27 Thread Yuanhan Liu
On Thu, Jan 21, 2016 at 02:50:01PM +0100, J?r?me Jutteau wrote:
> Hi Yuanhan,
> 
> 2015-12-14 2:47 GMT+01:00 Yuanhan Liu :
> > Right, I should move it in the beginning of this function.
> 
> Any news about this refactoring ?

Hi J?r?me,

Thanks for showing interests in this patch set; I was waiting for
Huawei's comments. And fortunately, he starts making comments.

--yliu


[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support

2016-01-27 Thread Yuanhan Liu
On Thu, Jan 21, 2016 at 12:49:10PM +0100, Thomas Monjalon wrote:
> 2016-01-19 16:12, Yuanhan Liu:
> >  int
> >  vtpci_init(struct rte_pci_device *dev, struct virtio_hw *hw)
> >  {
> > -   hw->vtpci_ops = &legacy_ops;
> > +   hw->dev = dev;
> > +
> > +   /*
> > +* Try if we can succeed reading virtio pci caps, which exists
> > +* only on modern pci device. If failed, we fallback to legacy
> > +* virtio handling.
> > +*/
> > +   if (virtio_read_caps(dev, hw) == 0) {
> > +   PMD_INIT_LOG(INFO, "modern virtio pci detected.");
> > +   hw->vtpci_ops = &modern_ops;
> > +   hw->modern= 1;
> > +   dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
> > +   return 0;
> > +   }
> 
> RTE_PCI_DRV_INTR_LSC is already set by virtio_resource_init_by_uio().

We don't go that far here. Here we just detect if it's a modern virtio
device. And if yes, we do some modern initiations, and return.

virtio_resource_init_by_uio() is invoked when virtio_read_caps() fails.

> Do you mean interrupt was not supported with legacy virtio?

Nope. this patch set changes nothing on legacy virtio support.

--yliu


[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support

2016-01-27 Thread Yuanhan Liu
On Thu, Jan 21, 2016 at 12:37:42PM +0100, Thomas Monjalon wrote:
> 2016-01-19 16:12, Yuanhan Liu:
> > +#define IO_READ_DEF(nr_bits, type) \
> > +static inline type \
> > +io_read##nr_bits(type *addr)   \
> > +{  \
> > +   return *(volatile type *)addr;  \
> > +}
> > +
> > +#define IO_WRITE_DEF(nr_bits, type)\
> > +static inline void \
> > +io_write##nr_bits(type val, type *addr)\
> > +{  \
> > +   *(volatile type *)addr = val;   \
> > +}
> > +
> > +IO_READ_DEF (8, uint8_t)
> > +IO_WRITE_DEF(8, uint8_t)
> > +
> > +IO_READ_DEF (16, uint16_t)
> > +IO_WRITE_DEF(16, uint16_t)
> > +
> > +IO_READ_DEF (32, uint32_t)
> > +IO_WRITE_DEF(32, uint32_t)
> 
> Yes you can do this.
> But not sure you should.
> 
> > +static inline void
> > +io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
> > +{
> > +   io_write32(val & ((1ULL << 32) - 1), lo);
> > +   io_write32(val >> 32,hi);
> > +}
> 
> When debugging this code, how GDB behave?
> How to find the definition of io_write32() with grep or simple editors?

Okay, I will unfold them.

--yliu


[dpdk-dev] [PATCH v2 00/16] fm10k: update shared code

2016-01-27 Thread Wang Xiao W
v2:
* Put the two extra fix patches ahead of the base code patches.

Wang Xiao W (16):
  fm10k: use default mailbox message handler for pf
  fm10k/base: add macro definitions that are needed
  fm10k/base: cleanup namespace pollution and correct typecast
  fm10k/base: use bitshift for itr_scale
  fm10k/base: reset max_queues on init_hw_vf failure
  fm10k/base: document ITR scale workaround in VF TDLEN register
  fm10k/base: fix checkpatch warning
  fm10k/base: use BIT macro instead of open-coded bit-shifting of 1
  fm10k/base: do not use CamelCase
  fm10k/base: use memcpy for mac addr copy
  fm10k/base: allow removal of is_slot_appropriate function
  fm10k/base: consistently use VLAN ID when referencing vid variables
  fm10k/base: fix comment per upstream review changes
  fm10k/base: TLV structures must be 4byte aligned, not 1byte aligned
  fm10k/base: move constants to the right of binary operators
  fm10k/base: minor cleanups

 drivers/net/fm10k/base/fm10k_api.c   |   2 +
 drivers/net/fm10k/base/fm10k_api.h   |   2 +
 drivers/net/fm10k/base/fm10k_mbx.c   |  63 +++-
 drivers/net/fm10k/base/fm10k_mbx.h   |  11 +--
 drivers/net/fm10k/base/fm10k_osdep.h |  30 ++
 drivers/net/fm10k/base/fm10k_pf.c|  88 +
 drivers/net/fm10k/base/fm10k_pf.h|  18 ++--
 drivers/net/fm10k/base/fm10k_tlv.c   |  40 
 drivers/net/fm10k/base/fm10k_tlv.h   |   9 +-
 drivers/net/fm10k/base/fm10k_type.h  | 182 +++
 drivers/net/fm10k/base/fm10k_vf.c|  32 --
 drivers/net/fm10k/fm10k_ethdev.c |  41 +++-
 12 files changed, 220 insertions(+), 298 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH v2 01/16] fm10k: use default mailbox message handler for pf

2016-01-27 Thread Wang Xiao W
The new share code makes fm10k_msg_update_pvid_pf function static, so we can
not refer to it now in fm10k_ethdev.c. The registered pf handler is almost the
same as the default pf handler, removing it has no impact on mailbox.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/fm10k_ethdev.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index e4aed94..2c38ce9 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2367,29 +2367,16 @@ static const struct fm10k_msg_data fm10k_msgdata_vf[] = 
{
FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
 };

-/* Mailbox message handler in PF */
-static const struct fm10k_msg_data fm10k_msgdata_pf[] = {
-   FM10K_PF_MSG_ERR_HANDLER(XCAST_MODES, fm10k_msg_err_pf),
-   FM10K_PF_MSG_ERR_HANDLER(UPDATE_MAC_FWD_RULE, fm10k_msg_err_pf),
-   FM10K_PF_MSG_LPORT_MAP_HANDLER(fm10k_msg_lport_map_pf),
-   FM10K_PF_MSG_ERR_HANDLER(LPORT_CREATE, fm10k_msg_err_pf),
-   FM10K_PF_MSG_ERR_HANDLER(LPORT_DELETE, fm10k_msg_err_pf),
-   FM10K_PF_MSG_UPDATE_PVID_HANDLER(fm10k_msg_update_pvid_pf),
-   FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
-};
-
 static int
 fm10k_setup_mbx_service(struct fm10k_hw *hw)
 {
-   int err;
+   int err = 0;

/* Initialize mailbox lock */
fm10k_mbx_initlock(hw);

/* Replace default message handler with new ones */
-   if (hw->mac.type == fm10k_mac_pf)
-   err = hw->mbx.ops.register_handlers(&hw->mbx, fm10k_msgdata_pf);
-   else
+   if (hw->mac.type == fm10k_mac_vf)
err = hw->mbx.ops.register_handlers(&hw->mbx, fm10k_msgdata_vf);

if (err) {
-- 
1.9.3



[dpdk-dev] [PATCH v2 02/16] fm10k/base: add macro definitions that are needed

2016-01-27 Thread Wang Xiao W
Some macros such as FM10K_RXINT_TIMER_SHIFT are removed in the share
code drop, but they are needed in dpdk/fm10k. This patch put all these
necessary macros into fm10k_osdep.h

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_osdep.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/fm10k/base/fm10k_osdep.h 
b/drivers/net/fm10k/base/fm10k_osdep.h
index 6852ef0..869af1b 100644
--- a/drivers/net/fm10k/base/fm10k_osdep.h
+++ b/drivers/net/fm10k/base/fm10k_osdep.h
@@ -150,6 +150,36 @@ typedef intbool;
 #define fm10k_read_reg FM10K_READ_REG
 #endif

+#define FM10K_INTEL_VENDOR_ID   0x8086
+#define FM10K_DMA_CTRL_MINMSS_SHIFT9
+#define FM10K_EICR_PCA_FAULT   0x0001
+#define FM10K_EICR_THI_FAULT   0x0004
+#define FM10K_EICR_FUM_FAULT   0x0020
+#define FM10K_EICR_SRAMERROR   0x0400
+#define FM10K_SRAM_IP  0x13003
+#define FM10K_RXINT_TIMER_SHIFT8
+#define FM10K_TXINT_TIMER_SHIFT8
+#define FM10K_RXD_PKTTYPE_MASK 0x03F0
+#define FM10K_RXD_PKTTYPE_SHIFT4
+enum fm10k_rdesc_pkt_type {
+   /* L3 type */
+   FM10K_PKTTYPE_OTHER = 0x00,
+   FM10K_PKTTYPE_IPV4  = 0x01,
+   FM10K_PKTTYPE_IPV4_EX   = 0x02,
+   FM10K_PKTTYPE_IPV6  = 0x03,
+   FM10K_PKTTYPE_IPV6_EX   = 0x04,
+
+   /* L4 type */
+   FM10K_PKTTYPE_TCP   = 0x08,
+   FM10K_PKTTYPE_UDP   = 0x10,
+   FM10K_PKTTYPE_GRE   = 0x18,
+   FM10K_PKTTYPE_VXLAN = 0x20,
+   FM10K_PKTTYPE_NVGRE = 0x28,
+   FM10K_PKTTYPE_GENEVE= 0x30
+};
+#define FM10K_RXD_STATUS_IPCS  0x0008 /* Indicates IPv4 csum */
+#define FM10K_RXD_STATUS_HBO   0x0400 /* header buffer overrun */
+
 #define FM10K_TSO_MINMSS \
(FM10K_DMA_CTRL_MINMSS_64 >> FM10K_DMA_CTRL_MINMSS_SHIFT)
 #define FM10K_TSO_MIN_HEADERLEN54
-- 
1.9.3



[dpdk-dev] [PATCH v2 03/16] fm10k/base: cleanup namespace pollution and correct typecast

2016-01-27 Thread Wang Xiao W
Correct typecast in fm10k_update_xc_addr_pf.

Make functions that are only referenced locally static.

And fix the function header comment for fm10k_tlv_attr_nest_stop() while
we're at it.

Wrap fm10k_msg_data fm10k_iov_msg_data_pf[] in the new ifndef
NO_DEFAULT_SRIOV_MSG_HANDLERS so that drivers with custom SR-IOV
message handlers can strip it.

remove unused struct element in struct fm10k_mac_ops.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c   | 10 ++
 drivers/net/fm10k/base/fm10k_pf.h   |  4 ++--
 drivers/net/fm10k/base/fm10k_tlv.c  | 16 
 drivers/net/fm10k/base/fm10k_tlv.h  |  5 -
 drivers/net/fm10k/base/fm10k_type.h |  1 -
 drivers/net/fm10k/base/fm10k_vf.c   |  2 --
 6 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 6e6d71e..5b8c039 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -379,8 +379,8 @@ STATIC s32 fm10k_update_xc_addr_pf(struct fm10k_hw *hw, u16 
glort,
 ((u32)mac[3] << 16) |
 ((u32)mac[4] << 8) |
 ((u32)mac[5]));
-   mac_update.mac_upper = FM10K_CPU_TO_LE16(((u32)mac[0] << 8) |
-((u32)mac[1]));
+   mac_update.mac_upper = FM10K_CPU_TO_LE16(((u16)mac[0] << 8) |
+  ((u16)mac[1]));
mac_update.vlan = FM10K_CPU_TO_LE16(vid);
mac_update.glort = FM10K_CPU_TO_LE16(glort);
mac_update.action = add ? 0 : 1;
@@ -1457,6 +1457,7 @@ s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *hw, u32 
**results,
return err;
 }

+#ifndef NO_DEFAULT_SRIOV_MSG_HANDLERS
 const struct fm10k_msg_data fm10k_iov_msg_data_pf[] = {
FM10K_TLV_MSG_TEST_HANDLER(fm10k_tlv_msg_test),
FM10K_VF_MSG_MSIX_HANDLER(fm10k_iov_msg_msix_pf),
@@ -1465,6 +1466,7 @@ const struct fm10k_msg_data fm10k_iov_msg_data_pf[] = {
FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
 };

+#endif
 /**
  *  fm10k_update_stats_hw_pf - Updates hardware related statistics of PF
  *  @hw: pointer to hardware structure
@@ -1754,8 +1756,8 @@ const struct fm10k_tlv_attr fm10k_update_pvid_msg_attr[] 
= {
  *
  *  This handler configures the default VLAN for the PF
  **/
-s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, u32 **results,
-struct fm10k_mbx_info *mbx)
+static s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, u32 **results,
+   struct fm10k_mbx_info *mbx)
 {
u16 glort, pvid;
u32 pvid_update;
diff --git a/drivers/net/fm10k/base/fm10k_pf.h 
b/drivers/net/fm10k/base/fm10k_pf.h
index 44bd193..92e2962 100644
--- a/drivers/net/fm10k/base/fm10k_pf.h
+++ b/drivers/net/fm10k/base/fm10k_pf.h
@@ -149,8 +149,6 @@ extern const struct fm10k_tlv_attr 
fm10k_lport_map_msg_attr[];
 #define FM10K_PF_MSG_LPORT_MAP_HANDLER(func) \
FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_MAP, \
  fm10k_lport_map_msg_attr, func)
-s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *, u32 **,
-struct fm10k_mbx_info *);
 extern const struct fm10k_tlv_attr fm10k_update_pvid_msg_attr[];
 #define FM10K_PF_MSG_UPDATE_PVID_HANDLER(func) \
FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_UPDATE_PVID, \
@@ -183,7 +181,9 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **,
  struct fm10k_mbx_info *);
 s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *, u32 **,
 struct fm10k_mbx_info *);
+#ifndef NO_DEFAULT_SRIOV_MSG_HANDLERS
 extern const struct fm10k_msg_data fm10k_iov_msg_data_pf[];
+#endif

 s32 fm10k_init_ops_pf(struct fm10k_hw *hw);
 #endif /* _FM10K_PF_H */
diff --git a/drivers/net/fm10k/base/fm10k_tlv.c 
b/drivers/net/fm10k/base/fm10k_tlv.c
index 1d9d7d8..ade87d1 100644
--- a/drivers/net/fm10k/base/fm10k_tlv.c
+++ b/drivers/net/fm10k/base/fm10k_tlv.c
@@ -63,8 +63,8 @@ s32 fm10k_tlv_msg_init(u32 *msg, u16 msg_id)
  *  the attribute buffer.  It will return success if provided with a valid
  *  pointers.
  **/
-s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id,
-  const unsigned char *string)
+static s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id,
+ const unsigned char *string)
 {
u32 attr_data = 0, len = 0;
u32 *attr;
@@ -115,7 +115,7 @@ s32 fm10k_tlv_attr_put_null_string(u32 *msg, u16 attr_id,
  *  it in the array pointed by by string.  It will return success if provided
  *  with a valid pointers.
  **/
-s32 fm10k_tlv_attr_get_null_string(u32 *attr, unsigned char *string)
+static s32 fm10k_tlv_attr_get_null_string(u32 *attr, unsigned char *string)
 {
u32 len;

@@ -386,7 +386,7 @@ s32 fm10k_tlv_attr_get_le_struct(u32 *attr, void

[dpdk-dev] [PATCH v2 04/16] fm10k/base: use bitshift for itr_scale

2016-01-27 Thread Wang Xiao W
Upstream community wishes us to use bitshift instead of a divisor,
because this is faster, and prevents any need for a '0' check. In our
case, this even works out because default Gen3 will be 0.

Because of this, we are also able to remove the check for non-zero value
in the vf code path since that will already be the default Gen3 case.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_type.h | 6 +++---
 drivers/net/fm10k/base/fm10k_vf.c   | 4 
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_type.h 
b/drivers/net/fm10k/base/fm10k_type.h
index 62fa73f..44187b1 100644
--- a/drivers/net/fm10k/base/fm10k_type.h
+++ b/drivers/net/fm10k/base/fm10k_type.h
@@ -352,9 +352,9 @@ struct fm10k_hw;
 #define FM10K_TDLEN(_n)((0x40 * (_n)) + 0x8002)
 #define FM10K_TDLEN_ITR_SCALE_SHIFT9
 #define FM10K_TDLEN_ITR_SCALE_MASK 0x0E00
-#define FM10K_TDLEN_ITR_SCALE_GEN1 4
-#define FM10K_TDLEN_ITR_SCALE_GEN2 2
-#define FM10K_TDLEN_ITR_SCALE_GEN3 1
+#define FM10K_TDLEN_ITR_SCALE_GEN1 2
+#define FM10K_TDLEN_ITR_SCALE_GEN2 1
+#define FM10K_TDLEN_ITR_SCALE_GEN3 0
 #define FM10K_TPH_TXCTRL(_n)   ((0x40 * (_n)) + 0x8003)
 #define FM10K_TPH_TXCTRL_DESC_TPHEN0x0020
 #define FM10K_TPH_TXCTRL_DESC_RROEN0x0200
diff --git a/drivers/net/fm10k/base/fm10k_vf.c 
b/drivers/net/fm10k/base/fm10k_vf.c
index 7822ab6..39bc927 100644
--- a/drivers/net/fm10k/base/fm10k_vf.c
+++ b/drivers/net/fm10k/base/fm10k_vf.c
@@ -159,10 +159,6 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
 FM10K_TDLEN_ITR_SCALE_MASK) >>
FM10K_TDLEN_ITR_SCALE_SHIFT;

-   /* ensure a non-zero itr scale */
-   if (!hw->mac.itr_scale)
-   hw->mac.itr_scale = FM10K_TDLEN_ITR_SCALE_GEN3;
-
return FM10K_SUCCESS;
 }

-- 
1.9.3



[dpdk-dev] [PATCH v2 05/16] fm10k/base: reset max_queues on init_hw_vf failure

2016-01-27 Thread Wang Xiao W
VF drivers must detect how many queues are available. Previously, the
driver assumed that each VF has at minimum 1 queue. This assumption is
incorrect, since it is possible that the PF has not yet assigned the
queues to the VF by the time the VF checks. To resolve this, we added a
check first to ensure that the first queue is infact owned by the VF at
init_hw_vf time. However, the code flow did not reset hw->mac.max_queues
to 0. In some cases, such as during reinit flows, we call init_hw_vf
without clearing the previous value of hw->mac.max_queues. Due to this,
when init_hw_vf errors out, if its error code is not properly handled
the VF driver may still believe it has queues which no longer belong to
it. Fix this by clearing the hw->mac.max_queues on exit due to errors.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_vf.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_vf.c 
b/drivers/net/fm10k/base/fm10k_vf.c
index 39bc927..9b10ee4 100644
--- a/drivers/net/fm10k/base/fm10k_vf.c
+++ b/drivers/net/fm10k/base/fm10k_vf.c
@@ -128,8 +128,10 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw)

/* verify we have at least 1 queue */
if (!~FM10K_READ_REG(hw, FM10K_TXQCTL(0)) ||
-   !~FM10K_READ_REG(hw, FM10K_RXQCTL(0)))
-   return FM10K_ERR_NO_RESOURCES;
+   !~FM10K_READ_REG(hw, FM10K_RXQCTL(0))) {
+   err = FM10K_ERR_NO_RESOURCES;
+   goto reset_max_queues;
+   }

/* determine how many queues we have */
for (i = 1; tqdloc0 && (i < FM10K_MAX_QUEUES_POOL); i++) {
@@ -147,7 +149,7 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
/* shut down queues we own and reset DMA configuration */
err = fm10k_disable_queues_generic(hw, i);
if (err)
-   return err;
+   goto reset_max_queues;

/* record maximum queue count */
hw->mac.max_queues = i;
@@ -160,6 +162,11 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
FM10K_TDLEN_ITR_SCALE_SHIFT;

return FM10K_SUCCESS;
+
+reset_max_queues:
+   hw->mac.max_queues = 0;
+
+   return err;
 }

 /**
-- 
1.9.3



[dpdk-dev] [PATCH v2 06/16] fm10k/base: document ITR scale workaround in VF TDLEN register

2016-01-27 Thread Wang Xiao W
Add comments which properly explain the undocumented use of bits in
TDLEN register prior to VF initializing it to the correct value. Note
that the mechanism is entirely software-defined and explain its purpose
to help reduce confusion in the future.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c   | 6 +-
 drivers/net/fm10k/base/fm10k_type.h | 9 +
 drivers/net/fm10k/base/fm10k_vf.c   | 9 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 5b8c039..6de679e 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -958,7 +958,8 @@ STATIC s32 fm10k_iov_assign_default_mac_vlan_pf(struct 
fm10k_hw *hw,
FM10K_WRITE_REG(hw, FM10K_TDBAH(vf_q_idx), tdbah);

/* Provide the VF the ITR scale, using software-defined fields in TDLEN
-* to pass the information during VF initialization
+* to pass the information during VF initialization. See definition of
+* FM10K_TDLEN_ITR_SCALE_SHIFT for more details.
 */
FM10K_WRITE_REG(hw, FM10K_TDLEN(vf_q_idx), hw->mac.itr_scale <<
   FM10K_TDLEN_ITR_SCALE_SHIFT);
@@ -1095,6 +1096,9 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw 
*hw,
for (i = queues_per_pool; i--;) {
FM10K_WRITE_REG(hw, FM10K_TDBAL(vf_q_idx + i), tdbal);
FM10K_WRITE_REG(hw, FM10K_TDBAH(vf_q_idx + i), tdbah);
+   /* See definition of FM10K_TDLEN_ITR_SCALE_SHIFT for an
+* explanation of how TDLEN is used.
+*/
FM10K_WRITE_REG(hw, FM10K_TDLEN(vf_q_idx + i),
hw->mac.itr_scale <<
FM10K_TDLEN_ITR_SCALE_SHIFT);
diff --git a/drivers/net/fm10k/base/fm10k_type.h 
b/drivers/net/fm10k/base/fm10k_type.h
index 44187b1..5db6345 100644
--- a/drivers/net/fm10k/base/fm10k_type.h
+++ b/drivers/net/fm10k/base/fm10k_type.h
@@ -350,6 +350,15 @@ struct fm10k_hw;
 #define FM10K_TDBAL(_n)((0x40 * (_n)) + 0x8000)
 #define FM10K_TDBAH(_n)((0x40 * (_n)) + 0x8001)
 #define FM10K_TDLEN(_n)((0x40 * (_n)) + 0x8002)
+/* When fist initialized, VFs need to know the Interrupt Throttle Rate (ITR)
+ * scale which is based on the PCIe speed but the speed information in the PCI
+ * configuration space may not be accurate. The PF already knows the ITR scale
+ * but there is no defined method to pass that information from the PF to the
+ * VF. This is accomplished during VF initialization by temporarily co-opting
+ * the yet-to-be-used TDLEN register to have the PF store the ITR shift for
+ * the VF to retrieve before the VF needs to use the TDLEN register for its
+ * intended purpose, i.e. before the Tx resources are allocated.
+ */
 #define FM10K_TDLEN_ITR_SCALE_SHIFT9
 #define FM10K_TDLEN_ITR_SCALE_MASK 0x0E00
 #define FM10K_TDLEN_ITR_SCALE_GEN1 2
diff --git a/drivers/net/fm10k/base/fm10k_vf.c 
b/drivers/net/fm10k/base/fm10k_vf.c
index 9b10ee4..43eb081 100644
--- a/drivers/net/fm10k/base/fm10k_vf.c
+++ b/drivers/net/fm10k/base/fm10k_vf.c
@@ -74,6 +74,11 @@ STATIC s32 fm10k_stop_hw_vf(struct fm10k_hw *hw)
FM10K_WRITE_REG(hw, FM10K_TDBAH(i), bah);
FM10K_WRITE_REG(hw, FM10K_RDBAL(i), bal);
FM10K_WRITE_REG(hw, FM10K_RDBAH(i), bah);
+   /* Restore ITR scale in software-defined mechanism in TDLEN
+* for next VF initialization. See definition of
+* FM10K_TDLEN_ITR_SCALE_SHIFT for more details on the use of
+* TDLEN here.
+*/
FM10K_WRITE_REG(hw, FM10K_TDLEN(i), tdlen);
}

@@ -157,6 +162,10 @@ STATIC s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
/* fetch default VLAN and ITR scale */
hw->mac.default_vid = (FM10K_READ_REG(hw, FM10K_TXQCTL(0)) &
   FM10K_TXQCTL_VID_MASK) >> FM10K_TXQCTL_VID_SHIFT;
+   /* Read the ITR scale from TDLEN. See the definition of
+* FM10K_TDLEN_ITR_SCALE_SHIFT for more information about how TDLEN is
+* used here.
+*/
hw->mac.itr_scale = (FM10K_READ_REG(hw, FM10K_TDLEN(0)) &
 FM10K_TDLEN_ITR_SCALE_MASK) >>
FM10K_TDLEN_ITR_SCALE_SHIFT;
-- 
1.9.3



[dpdk-dev] [PATCH v2 07/16] fm10k/base: fix checkpatch warning

2016-01-27 Thread Wang Xiao W
Cleanup lines over 80 characters.
Cleanup useless else, checkpatch warns that else is not generally
useful after a break or return.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_mbx.c |  2 +-
 drivers/net/fm10k/base/fm10k_pf.c  | 19 ++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_mbx.c 
b/drivers/net/fm10k/base/fm10k_mbx.c
index 3c9ab3a..7d03704 100644
--- a/drivers/net/fm10k/base/fm10k_mbx.c
+++ b/drivers/net/fm10k/base/fm10k_mbx.c
@@ -930,7 +930,7 @@ STATIC void fm10k_mbx_create_disconnect_hdr(struct 
fm10k_mbx_info *mbx)
 }

 /**
- *  fm10k_mbx_create_fake_disconnect_hdr - Generate a false disconnect mailbox 
header
+ *  fm10k_mbx_create_fake_disconnect_hdr - Generate a false disconnect mbox hdr
  *  @mbx: pointer to mailbox
  *
  *  This function creates a fake disconnect header for loading into remote
diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 6de679e..3ee88b6 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -1278,8 +1278,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 
**results,
err = fm10k_iov_select_vid(vf_info, (u16)vid);
if (err < 0)
return err;
-   else
-   vid = err;
+
+   vid = err;

/* update VSI info for VF in regards to VLAN table */
err = hw->mac.ops.update_vlan(hw, vid, vf_info->vsi, set);
@@ -1304,8 +1304,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 
**results,
err = fm10k_iov_select_vid(vf_info, vlan);
if (err < 0)
return err;
-   else
-   vlan = (u16)err;
+
+   vlan = (u16)err;

/* notify switch of request for new unicast address */
err = hw->mac.ops.update_uc_addr(hw, vf_info->glort,
@@ -1330,8 +1330,8 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 
**results,
err = fm10k_iov_select_vid(vf_info, vlan);
if (err < 0)
return err;
-   else
-   vlan = (u16)err;
+
+   vlan = (u16)err;

/* notify switch of request for new multicast address */
err = hw->mac.ops.update_mc_addr(hw, vf_info->glort,
@@ -1500,9 +1500,10 @@ STATIC void fm10k_update_hw_stats_pf(struct fm10k_hw *hw,
xec = fm10k_read_hw_stats_32b(hw, FM10K_STATS_XEC, &stats->xec);
vlan_drop = fm10k_read_hw_stats_32b(hw, FM10K_STATS_VLAN_DROP,
&stats->vlan_drop);
-   loopback_drop = fm10k_read_hw_stats_32b(hw,
-   
FM10K_STATS_LOOPBACK_DROP,
-   &stats->loopback_drop);
+   loopback_drop =
+   fm10k_read_hw_stats_32b(hw,
+   FM10K_STATS_LOOPBACK_DROP,
+   &stats->loopback_drop);
nodesc_drop = fm10k_read_hw_stats_32b(hw,
  FM10K_STATS_NODESC_DROP,
  &stats->nodesc_drop);
-- 
1.9.3



[dpdk-dev] [PATCH v2 08/16] fm10k/base: use BIT macro instead of open-coded bit-shifting of 1

2016-01-27 Thread Wang Xiao W
The upstream Linux kernel community prefers using the BIT macro over
bit-shifting a 1.  Similar to how this is handled in the i40e shared code,
define a macro for OSes that do not already have it and wrap all that in
LINUX_MACROS so that it can be stripped from the Linux driver.

The upstream Linux kernel community prefers avoiding CamelCase in
variables, function names, etc.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c   | 12 ++--
 drivers/net/fm10k/base/fm10k_tlv.c  | 24 
 drivers/net/fm10k/base/fm10k_type.h | 18 --
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 3ee88b6..7d48210 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -576,8 +576,8 @@ STATIC s32 fm10k_configure_dglort_map_pf(struct fm10k_hw 
*hw,
return FM10K_ERR_PARAM;

/* determine count of VSIs and queues */
-   queue_count = 1 << (dglort->rss_l + dglort->pc_l);
-   vsi_count = 1 << (dglort->vsi_l + dglort->queue_l);
+   queue_count = BIT(dglort->rss_l + dglort->pc_l);
+   vsi_count = BIT(dglort->vsi_l + dglort->queue_l);
glort = dglort->glort;
q_idx = dglort->queue_b;

@@ -593,8 +593,8 @@ STATIC s32 fm10k_configure_dglort_map_pf(struct fm10k_hw 
*hw,
}

/* determine count of PCs and queues */
-   queue_count = 1 << (dglort->queue_l + dglort->rss_l + dglort->vsi_l);
-   pc_count = 1 << dglort->pc_l;
+   queue_count = BIT(dglort->queue_l + dglort->rss_l + dglort->vsi_l);
+   pc_count = BIT(dglort->pc_l);

/* configure PC for Tx queues */
for (pc = 0; pc < pc_count; pc++) {
@@ -1001,7 +1001,7 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw 
*hw,
return FM10K_ERR_PARAM;

/* clear event notification of VF FLR */
-   FM10K_WRITE_REG(hw, FM10K_PFVFLREC(vf_idx / 32), 1 << (vf_idx % 32));
+   FM10K_WRITE_REG(hw, FM10K_PFVFLREC(vf_idx / 32), BIT(vf_idx % 32));

/* force timeout and then disconnect the mailbox */
vf_info->mbx.timeout = 0;
@@ -1417,7 +1417,7 @@ s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *hw, u32 
**results,
mode = fm10k_iov_supported_xcast_mode_pf(vf_info, mode);

/* if mode is not currently enabled, enable it */
-   if (!(FM10K_VF_FLAG_ENABLED(vf_info) & (1 << mode)))
+   if (!(FM10K_VF_FLAG_ENABLED(vf_info) & BIT(mode)))
fm10k_update_xcast_mode_pf(hw, vf_info->glort, mode);

/* swap mode back to a bit flag */
diff --git a/drivers/net/fm10k/base/fm10k_tlv.c 
b/drivers/net/fm10k/base/fm10k_tlv.c
index ade87d1..e6150c1 100644
--- a/drivers/net/fm10k/base/fm10k_tlv.c
+++ b/drivers/net/fm10k/base/fm10k_tlv.c
@@ -249,7 +249,7 @@ s32 fm10k_tlv_attr_put_value(u32 *msg, u16 attr_id, s64 
value, u32 len)
attr = &msg[FM10K_TLV_DWORD_LEN(*msg)];

if (len < 4) {
-   attr[1] = (u32)value & ((0x1ul << (8 * len)) - 1);
+   attr[1] = (u32)value & (BIT(8 * len) - 1);
} else {
attr[1] = (u32)value;
if (len > 4)
@@ -699,29 +699,29 @@ STATIC void fm10k_tlv_msg_test_generate_data(u32 *msg, 
u32 attr_flags)
 {
DEBUGFUNC("fm10k_tlv_msg_test_generate_data");

-   if (attr_flags & (1 << FM10K_TEST_MSG_STRING))
+   if (attr_flags & BIT(FM10K_TEST_MSG_STRING))
fm10k_tlv_attr_put_null_string(msg, FM10K_TEST_MSG_STRING,
   test_str);
-   if (attr_flags & (1 << FM10K_TEST_MSG_MAC_ADDR))
+   if (attr_flags & BIT(FM10K_TEST_MSG_MAC_ADDR))
fm10k_tlv_attr_put_mac_vlan(msg, FM10K_TEST_MSG_MAC_ADDR,
test_mac, test_vlan);
-   if (attr_flags & (1 << FM10K_TEST_MSG_U8))
+   if (attr_flags & BIT(FM10K_TEST_MSG_U8))
fm10k_tlv_attr_put_u8(msg, FM10K_TEST_MSG_U8,  test_u8);
-   if (attr_flags & (1 << FM10K_TEST_MSG_U16))
+   if (attr_flags & BIT(FM10K_TEST_MSG_U16))
fm10k_tlv_attr_put_u16(msg, FM10K_TEST_MSG_U16, test_u16);
-   if (attr_flags & (1 << FM10K_TEST_MSG_U32))
+   if (attr_flags & BIT(FM10K_TEST_MSG_U32))
fm10k_tlv_attr_put_u32(msg, FM10K_TEST_MSG_U32, test_u32);
-   if (attr_flags & (1 << FM10K_TEST_MSG_U64))
+   if (attr_flags & BIT(FM10K_TEST_MSG_U64))
fm10k_tlv_attr_put_u64(msg, FM10K_TEST_MSG_U64, test_u64);
-   if (attr_flags & (1 << FM10K_TEST_MSG_S8))
+   if (attr_flags & BIT(FM10K_TEST_MSG_S8))
fm10k_tlv_attr_put_s8(msg, FM10K_TEST_MSG_S8,  test_s8);
-   if (attr_flags & (1 << FM10K_TEST_MSG_S16))
+   if (attr_flags & BIT(FM10K_TEST_MSG_S16))
fm10k_tlv_attr_put_s16(msg, FM10K_TEST_MSG_S16, test_s16);
-   if (attr_flags & (1 << FM

[dpdk-dev] [PATCH v2 09/16] fm10k/base: do not use CamelCase

2016-01-27 Thread Wang Xiao W
The upstream Linux kernel community prefers avoiding CamelCase in
variables, function names, etc.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_type.h | 14 +++---
 drivers/net/fm10k/fm10k_ethdev.c| 24 
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_type.h 
b/drivers/net/fm10k/base/fm10k_type.h
index 387d25b..c9885a1 100644
--- a/drivers/net/fm10k/base/fm10k_type.h
+++ b/drivers/net/fm10k/base/fm10k_type.h
@@ -531,13 +531,13 @@ struct fm10k_hw;
 #endif

 enum fm10k_int_source {
-   fm10k_int_Mailbox   = 0,
-   fm10k_int_PCIeFault = 1,
-   fm10k_int_SwitchUpDown  = 2,
-   fm10k_int_SwitchEvent   = 3,
-   fm10k_int_SRAM  = 4,
-   fm10k_int_VFLR  = 5,
-   fm10k_int_MaxHoldTime   = 6,
+   fm10k_int_mailbox   = 0,
+   fm10k_int_pcie_fault= 1,
+   fm10k_int_switch_up_down= 2,
+   fm10k_int_switch_event  = 3,
+   fm10k_int_sram  = 4,
+   fm10k_int_vflr  = 5,
+   fm10k_int_max_hold_time = 6,
fm10k_int_sources_max_pf
 };

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 2c38ce9..a118cf4 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2074,12 +2074,12 @@ fm10k_dev_enable_intr_pf(struct rte_eth_dev *dev)
/* Bind all local non-queue interrupt to vector 0 */
int_map |= 0;

-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_mailbox), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_pcie_fault), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_up_down), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_event), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_sram), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_vflr), int_map);

/* Enable misc causes */
FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_ENABLE(PCA_FAULT) |
@@ -2105,12 +2105,12 @@ fm10k_dev_disable_intr_pf(struct rte_eth_dev *dev)

int_map |= 0;

-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map);
-   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_mailbox), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_pcie_fault), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_up_down), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_switch_event), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_sram), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_vflr), int_map);

/* Disable misc causes */
FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_DISABLE(PCA_FAULT) |
-- 
1.9.3



[dpdk-dev] [PATCH v2 10/16] fm10k/base: use memcpy for mac addr copy

2016-01-27 Thread Wang Xiao W
Use memcpy instead of copying MAC address byte-by-byte.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 7d48210..a1469aa 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -300,7 +300,6 @@ STATIC s32 fm10k_read_mac_addr_pf(struct fm10k_hw *hw)
 {
u8 perm_addr[ETH_ALEN];
u32 serial_num;
-   int i;

DEBUGFUNC("fm10k_read_mac_addr_pf");

@@ -324,10 +323,8 @@ STATIC s32 fm10k_read_mac_addr_pf(struct fm10k_hw *hw)
perm_addr[4] = (u8)(serial_num >> 8);
perm_addr[5] = (u8)(serial_num);

-   for (i = 0; i < ETH_ALEN; i++) {
-   hw->mac.perm_addr[i] = perm_addr[i];
-   hw->mac.addr[i] = perm_addr[i];
-   }
+   memcpy(hw->mac.perm_addr, perm_addr, ETH_ALEN);
+   memcpy(hw->mac.addr, perm_addr, ETH_ALEN);

return FM10K_SUCCESS;
 }
-- 
1.9.3



[dpdk-dev] [PATCH v2 11/16] fm10k/base: allow removal of is_slot_appropriate function

2016-01-27 Thread Wang Xiao W
The Linux Kernel provides the OS a call "pcie_get_minimum_link" which
can crawl the PCIe tree and determine the actual minimum link speed of a
device which is a more general check than provided by
is_slot_appropriate. Thus, the upstream driver does not use or want the
is_slot_appropriate function call. Add a NO_IS_SLOT_APPROPRIATE_CHECK
definition which can be defined during strip process to remove the code.
If left undefined (the default) then the code will all be active and no
driver changes should be necessary.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_api.c  | 2 ++
 drivers/net/fm10k/base/fm10k_api.h  | 2 ++
 drivers/net/fm10k/base/fm10k_pf.c   | 4 
 drivers/net/fm10k/base/fm10k_type.h | 2 ++
 drivers/net/fm10k/base/fm10k_vf.c   | 4 
 5 files changed, 14 insertions(+)

diff --git a/drivers/net/fm10k/base/fm10k_api.c 
b/drivers/net/fm10k/base/fm10k_api.c
index eb5bdaa..c49d20d 100644
--- a/drivers/net/fm10k/base/fm10k_api.c
+++ b/drivers/net/fm10k/base/fm10k_api.c
@@ -181,6 +181,7 @@ s32 fm10k_get_bus_info(struct fm10k_hw *hw)
   FM10K_NOT_IMPLEMENTED);
 }

+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
 /**
  *  fm10k_is_slot_appropriate - Indicate appropriate slot for this SKU
  *  @hw: pointer to hardware structure
@@ -195,6 +196,7 @@ bool fm10k_is_slot_appropriate(struct fm10k_hw *hw)
return true;
 }

+#endif
 /**
  *  fm10k_update_vlan - Clear VLAN ID to VLAN filter table
  *  @hw: pointer to hardware structure
diff --git a/drivers/net/fm10k/base/fm10k_api.h 
b/drivers/net/fm10k/base/fm10k_api.h
index 113aef5..2ab3149 100644
--- a/drivers/net/fm10k/base/fm10k_api.h
+++ b/drivers/net/fm10k/base/fm10k_api.h
@@ -44,7 +44,9 @@ s32 fm10k_stop_hw(struct fm10k_hw *hw);
 s32 fm10k_start_hw(struct fm10k_hw *hw);
 s32 fm10k_init_shared_code(struct fm10k_hw *hw);
 s32 fm10k_get_bus_info(struct fm10k_hw *hw);
+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
 bool fm10k_is_slot_appropriate(struct fm10k_hw *hw);
+#endif
 s32 fm10k_update_vlan(struct fm10k_hw *hw, u32 vid, u8 idx, bool set);
 s32 fm10k_read_mac_addr(struct fm10k_hw *hw);
 void fm10k_update_hw_stats(struct fm10k_hw *hw, struct fm10k_hw_stats *stats);
diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index a1469aa..f5cbda4 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -216,6 +216,7 @@ STATIC s32 fm10k_init_hw_pf(struct fm10k_hw *hw)
return FM10K_SUCCESS;
 }

+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
 /**
  *  fm10k_is_slot_appropriate_pf - Indicate appropriate slot for this SKU
  *  @hw: pointer to hardware structure
@@ -231,6 +232,7 @@ STATIC bool fm10k_is_slot_appropriate_pf(struct fm10k_hw 
*hw)
   (hw->bus.width == hw->bus_caps.width);
 }

+#endif
 /**
  *  fm10k_update_vlan_pf - Update status of VLAN ID in VLAN filter table
  *  @hw: pointer to hardware structure
@@ -2064,7 +2066,9 @@ s32 fm10k_init_ops_pf(struct fm10k_hw *hw)
mac->ops.init_hw = &fm10k_init_hw_pf;
mac->ops.start_hw = &fm10k_start_hw_generic;
mac->ops.stop_hw = &fm10k_stop_hw_generic;
+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
mac->ops.is_slot_appropriate = &fm10k_is_slot_appropriate_pf;
+#endif
mac->ops.update_vlan = &fm10k_update_vlan_pf;
mac->ops.read_mac_addr = &fm10k_read_mac_addr_pf;
mac->ops.update_uc_addr = &fm10k_update_uc_addr_pf;
diff --git a/drivers/net/fm10k/base/fm10k_type.h 
b/drivers/net/fm10k/base/fm10k_type.h
index c9885a1..ba0a184 100644
--- a/drivers/net/fm10k/base/fm10k_type.h
+++ b/drivers/net/fm10k/base/fm10k_type.h
@@ -679,7 +679,9 @@ struct fm10k_mac_ops {
s32 (*stop_hw)(struct fm10k_hw *);
s32 (*get_bus_info)(struct fm10k_hw *);
s32 (*get_host_state)(struct fm10k_hw *, bool *);
+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
bool (*is_slot_appropriate)(struct fm10k_hw *);
+#endif
s32 (*update_vlan)(struct fm10k_hw *, u32, u8, bool);
s32 (*read_mac_addr)(struct fm10k_hw *);
s32 (*update_uc_addr)(struct fm10k_hw *, u16, const u8 *,
diff --git a/drivers/net/fm10k/base/fm10k_vf.c 
b/drivers/net/fm10k/base/fm10k_vf.c
index 43eb081..efbdbd1 100644
--- a/drivers/net/fm10k/base/fm10k_vf.c
+++ b/drivers/net/fm10k/base/fm10k_vf.c
@@ -178,6 +178,7 @@ reset_max_queues:
return err;
 }

+#ifndef NO_IS_SLOT_APPROPRIATE_CHECK
 /**
  *  fm10k_is_slot_appropriate_vf - Indicate appropriate slot for this SKU
  *  @hw: pointer to hardware structure
@@ -194,6 +195,7 @@ STATIC bool fm10k_is_slot_appropriate_vf(struct fm10k_hw 
*hw)
return TRUE;
 }

+#endif
 /* This structure defines the attibutes to be parsed below */
 const struct fm10k_tlv_attr fm10k_mac_vlan_msg_attr[] = {
FM10K_TLV_ATTR_U32(FM10K_MAC_VLAN_MSG_VLAN),
@@ -648,7 +650,9 @@ s32 fm10k_init_ops_vf(struct fm10k_hw *hw)
mac->ops.init_hw = &fm10k_init_hw_vf;
mac->ops.start_hw = &fm10k_start_hw_generic;
mac->ops.stop_hw = &fm10k_stop_hw

[dpdk-dev] [PATCH v2 12/16] fm10k/base: consistently use VLAN ID when referencing vid variables

2016-01-27 Thread Wang Xiao W
The vid variable name is shorthand for VLAN ID, so we should use this in
comments explaining what is happening.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index f5cbda4..716d7f1 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -970,7 +970,7 @@ err_out:
txqctl |= (vf_idx << FM10K_TXQCTL_TC_SHIFT) |
  FM10K_TXQCTL_VF | vf_idx;

-   /* assign VID */
+   /* assign VLAN ID */
for (i = 0; i < queues_per_pool; i++)
FM10K_WRITE_REG(hw, FM10K_TXQCTL(vf_q_idx + i), txqctl);

@@ -1215,12 +1215,12 @@ s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *hw, u32 
**results,
 }

 /**
- * fm10k_iov_select_vid - Select correct default vid
+ * fm10k_iov_select_vid - Select correct default VLAN ID
  * @hw: Pointer to hardware structure
- * @vid: vid to correct
+ * @vid: VLAN ID to correct
  *
- * Will report an error if vid is out of range. For vid = 0, it will return
- * either the pf_vid or sw_vid depending on which one is set.
+ * Will report an error if the VLAN ID is out of range. For VID = 0, it will
+ * return either the pf_vid or sw_vid depending on which one is set.
  */
 STATIC s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid)
 {
@@ -1783,7 +1783,7 @@ static s32 fm10k_msg_update_pvid_pf(struct fm10k_hw *hw, 
u32 **results,
if (!fm10k_glort_valid_pf(hw, glort))
return FM10K_ERR_PARAM;

-   /* verify VID is valid */
+   /* verify VLAN ID is valid */
if (pvid >= FM10K_VLAN_TABLE_VID_MAX)
return FM10K_ERR_PARAM;

-- 
1.9.3



[dpdk-dev] [PATCH v2 13/16] fm10k/base: fix comment per upstream review changes

2016-01-27 Thread Wang Xiao W
The comment here was changed during review of upstream patch, and the
new wording is slightly more clear. Re-write the comment in SHARED code
based on this new wording.

Fix a number of mailbox comment issues with function header comments,
lower-case acronyms (i.e. FIFO, TLV), incorrect function names in DEBUGFUNC(),
duplicate comments and a stubbed-out header comment for fm10k_sm_mbx_init.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_mbx.c | 61 ++
 drivers/net/fm10k/base/fm10k_mbx.h |  4 +--
 drivers/net/fm10k/base/fm10k_pf.c  | 12 
 drivers/net/fm10k/base/fm10k_tlv.h |  4 +--
 4 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_mbx.c 
b/drivers/net/fm10k/base/fm10k_mbx.c
index 7d03704..2e70434 100644
--- a/drivers/net/fm10k/base/fm10k_mbx.c
+++ b/drivers/net/fm10k/base/fm10k_mbx.c
@@ -70,7 +70,7 @@ STATIC u16 fm10k_fifo_unused(struct fm10k_mbx_fifo *fifo)
 }

 /**
- *  fm10k_fifo_empty - Test to verify if fifo is empty
+ *  fm10k_fifo_empty - Test to verify if FIFO is empty
  *  @fifo: pointer to FIFO
  *
  *  This function returns true if the FIFO is empty, else false
@@ -85,7 +85,7 @@ STATIC bool fm10k_fifo_empty(struct fm10k_mbx_fifo *fifo)
  *  @fifo: pointer to FIFO
  *  @offset: offset to add to head
  *
- *  This function returns the indices into the fifo based on head + offset
+ *  This function returns the indices into the FIFO based on head + offset
  **/
 STATIC u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo *fifo, u16 offset)
 {
@@ -97,7 +97,7 @@ STATIC u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo 
*fifo, u16 offset)
  *  @fifo: pointer to FIFO
  *  @offset: offset to add to tail
  *
- *  This function returns the indices into the fifo based on tail + offset
+ *  This function returns the indices into the FIFO based on tail + offset
  **/
 STATIC u16 fm10k_fifo_tail_offset(struct fm10k_mbx_fifo *fifo, u16 offset)
 {
@@ -173,7 +173,7 @@ STATIC u16 fm10k_mbx_index_len(struct fm10k_mbx_info *mbx, 
u16 head, u16 tail)
 /**
  *  fm10k_mbx_tail_add - Determine new tail value with added offset
  *  @mbx: pointer to mailbox
- *  @offset: length to add to head offset
+ *  @offset: length to add to tail offset
  *
  *  This function takes the local tail index and recomputes it for
  *  a given length added as an offset.
@@ -189,7 +189,7 @@ STATIC u16 fm10k_mbx_tail_add(struct fm10k_mbx_info *mbx, 
u16 offset)
 /**
  *  fm10k_mbx_tail_sub - Determine new tail value with subtracted offset
  *  @mbx: pointer to mailbox
- *  @offset: length to add to head offset
+ *  @offset: length to add to tail offset
  *
  *  This function takes the local tail index and recomputes it for
  *  a given length added as an offset.
@@ -253,7 +253,7 @@ STATIC u16 fm10k_mbx_pushed_tail_len(struct fm10k_mbx_info 
*mbx)
 }

 /**
- *  fm10k_fifo_write_copy - pulls data off of msg and places it in fifo
+ *  fm10k_fifo_write_copy - pulls data off of msg and places it in FIFO
  *  @fifo: pointer to FIFO
  *  @msg: message array to populate
  *  @tail_offset: additional offset to add to tail pointer
@@ -331,7 +331,7 @@ STATIC u16 fm10k_mbx_validate_msg_size(struct 
fm10k_mbx_info *mbx, u16 len)
u16 total_len = 0, msg_len;
u32 *msg;

-   DEBUGFUNC("fm10k_mbx_validate_msg");
+   DEBUGFUNC("fm10k_mbx_validate_msg_size");

/* length should include previous amounts pushed */
len += mbx->pushed;
@@ -353,6 +353,7 @@ STATIC u16 fm10k_mbx_validate_msg_size(struct 
fm10k_mbx_info *mbx, u16 len)

 /**
  *  fm10k_mbx_write_copy - pulls data off of Tx FIFO and places it in mbmem
+ *  @hw: pointer to hardware structure
  *  @mbx: pointer to mailbox
  *
  *  This function will take a section of the Tx FIFO and copy it into the
@@ -734,7 +735,7 @@ STATIC bool fm10k_mbx_tx_complete(struct fm10k_mbx_info 
*mbx)
  *  @hw: pointer to hardware structure
  *  @mbx: pointer to mailbox
  *
- *  This function dequeues messages and hands them off to the tlv parser.
+ *  This function dequeues messages and hands them off to the TLV parser.
  *  It will return the number of messages processed when called.
  **/
 STATIC u16 fm10k_mbx_dequeue_rx(struct fm10k_hw *hw,
@@ -951,7 +952,7 @@ STATIC void fm10k_mbx_create_fake_disconnect_hdr(struct 
fm10k_mbx_info *mbx)
 }

 /**
- *  fm10k_mbx_create_error_msg - Generate a error message
+ *  fm10k_mbx_create_error_msg - Generate an error message
  *  @mbx: pointer to mailbox
  *  @err: local error encountered
  *
@@ -984,7 +985,6 @@ STATIC void fm10k_mbx_create_error_msg(struct 
fm10k_mbx_info *mbx, s32 err)
 /**
  *  fm10k_mbx_validate_msg_hdr - Validate common fields in the message header
  *  @mbx: pointer to mailbox
- *  @msg: message array to read
  *
  *  This function will parse up the fields in the mailbox header and return
  *  an error if the header contains any of a number of invalid configurations
@@ -1050,11 +1050,12 @@ STATIC s32 fm10k_mbx_validate_msg_hdr(st

[dpdk-dev] [PATCH v2 14/16] fm10k/base: TLV structures must be 4byte aligned, not 1byte aligned

2016-01-27 Thread Wang Xiao W
Per comments from an upstream patch, and looking at how TLV LE_STRUCT
code works, we actually want these structures to be 4byte aligned, not
1byte aligned. In practice, 1byte alignment has worked so far because
all our structures end up being a multiple of 4. But if a future TLV
structure were added that had a u8 or similar sticking on the end things
would break. Fix this by using 4byte alignment which will prevent the
TLV LE_STRUCT code from breaking. Update the comment explaining that we
need 4byte alignment of our structures.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.h 
b/drivers/net/fm10k/base/fm10k_pf.h
index 92e2962..ee8527a 100644
--- a/drivers/net/fm10k/base/fm10k_pf.h
+++ b/drivers/net/fm10k/base/fm10k_pf.h
@@ -91,14 +91,14 @@ enum fm10k_pf_tlv_attr_id_v1 {
 #define FM10K_MSG_UPDATE_PVID_PVID_SHIFT   16
 #define FM10K_MSG_UPDATE_PVID_PVID_SIZE16

-/* The following data structures are overlayed specifically to TLV mailbox
- * messages, and must not have gaps between their values. They must line up
- * correctly to the TLV definition.
+/* The following data structures are overlayed directly onto TLV mailbox
+ * messages, and must not break 4 byte alignment. Ensure the structures line
+ * up correctly as per their TLV definition.
  */
 #ifdef C99
-#pragma pack(push, 1)
+#pragma pack(push, 4)
 #else
-#pragma pack(1)
+#pragma pack(4)
 #endif /* C99 */

 struct fm10k_mac_update {
-- 
1.9.3



[dpdk-dev] [PATCH v2 16/16] fm10k/base: minor cleanups

2016-01-27 Thread Wang Xiao W
Some cleanups to better reflect the code that was actually pushed out to
the upstream Linux community.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_mbx.h  |   7 --
 drivers/net/fm10k/base/fm10k_pf.h   |   4 --
 drivers/net/fm10k/base/fm10k_type.h | 132 
 3 files changed, 143 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_mbx.h 
b/drivers/net/fm10k/base/fm10k_mbx.h
index e642c2f..edc57df 100644
--- a/drivers/net/fm10k/base/fm10k_mbx.h
+++ b/drivers/net/fm10k/base/fm10k_mbx.h
@@ -48,7 +48,6 @@ struct fm10k_mbx_info;
 /* XOR provides means of switching from Tx to Rx FIFO */
 #define FM10K_MBMEM_PF_XOR (FM10K_MBMEM_SM(0) ^ FM10K_MBMEM_PF(0))
 #define FM10K_MBX(_n)  ((_n) + 0x18800)
-#define FM10K_MBX_OWNER0x0001
 #define FM10K_MBX_REQ  0x0002
 #define FM10K_MBX_ACK  0x0004
 #define FM10K_MBX_REQ_INTERRUPT0x0008
@@ -213,7 +212,6 @@ enum fm10k_msg_type {
 /* version number for switch manager mailboxes */
 #define FM10K_SM_MBX_VERSION   1
 #define FM10K_SM_MBX_FIFO_LEN  (FM10K_MBMEM_PF_XOR - 1)
-#define FM10K_SM_MBX_FIFO_HDR_LEN  1

 /* offsets shared between all SM FIFO headers */
 #define FM10K_MSG_SM_TAIL_SHIFT0
@@ -233,18 +231,13 @@ enum fm10k_msg_type {
  */
 #define FM10K_MBX_ERR(_n) ((_n) - 512)
 #define FM10K_MBX_ERR_NO_MBX   FM10K_MBX_ERR(0x01)
-#define FM10K_MBX_ERR_NO_MSG   FM10K_MBX_ERR(0x02)
 #define FM10K_MBX_ERR_NO_SPACE FM10K_MBX_ERR(0x03)
-#define FM10K_MBX_ERR_LOCK FM10K_MBX_ERR(0x04)
 #define FM10K_MBX_ERR_TAIL FM10K_MBX_ERR(0x05)
 #define FM10K_MBX_ERR_HEAD FM10K_MBX_ERR(0x06)
-#define FM10K_MBX_ERR_DST  FM10K_MBX_ERR(0x07)
 #define FM10K_MBX_ERR_SRC  FM10K_MBX_ERR(0x08)
 #define FM10K_MBX_ERR_TYPE FM10K_MBX_ERR(0x09)
-#define FM10K_MBX_ERR_LEN  FM10K_MBX_ERR(0x0A)
 #define FM10K_MBX_ERR_SIZE FM10K_MBX_ERR(0x0B)
 #define FM10K_MBX_ERR_BUSY FM10K_MBX_ERR(0x0C)
-#define FM10K_MBX_ERR_VALUEFM10K_MBX_ERR(0x0D)
 #define FM10K_MBX_ERR_RSVD0FM10K_MBX_ERR(0x0E)
 #define FM10K_MBX_ERR_CRC  FM10K_MBX_ERR(0x0F)

diff --git a/drivers/net/fm10k/base/fm10k_pf.h 
b/drivers/net/fm10k/base/fm10k_pf.h
index ee8527a..c84b1bc 100644
--- a/drivers/net/fm10k/base/fm10k_pf.h
+++ b/drivers/net/fm10k/base/fm10k_pf.h
@@ -140,10 +140,6 @@ struct fm10k_swapi_1588_clock_owner {
 #pragma pack()
 #endif /* C99 */

-#define FM10K_PF_MSG_LPORT_CREATE_HANDLER(func) \
-   FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_CREATE, NULL, func)
-#define FM10K_PF_MSG_LPORT_DELETE_HANDLER(func) \
-   FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_LPORT_DELETE, NULL, func)
 s32 fm10k_msg_lport_map_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *);
 extern const struct fm10k_tlv_attr fm10k_lport_map_msg_attr[];
 #define FM10K_PF_MSG_LPORT_MAP_HANDLER(func) \
diff --git a/drivers/net/fm10k/base/fm10k_type.h 
b/drivers/net/fm10k/base/fm10k_type.h
index ba0a184..3fc8f13 100644
--- a/drivers/net/fm10k/base/fm10k_type.h
+++ b/drivers/net/fm10k/base/fm10k_type.h
@@ -40,7 +40,6 @@ struct fm10k_hw;
 #include "fm10k_osdep.h"
 #include "fm10k_mbx.h"

-#define FM10K_INTEL_VENDOR_ID  0x8086
 #define FM10K_DEV_ID_PF0x15A4
 #define FM10K_DEV_ID_VF0x15A5
 #ifdef BOULDER_RAPIDS_HW
@@ -121,28 +120,16 @@ struct fm10k_hw;
 #define FM10K_CTRL_BAR4_ALLOWED0x0004

 #define FM10K_CTRL_EXT 0x0001
-#define FM10K_CTRL_EXT_NS_DIS  0x0001
-#define FM10K_CTRL_EXT_RO_DIS  0x0002
-#define FM10K_CTRL_EXT_SWITCH_LOOPBACK 0x0004
-#define FM10K_EXVET0x0002
-#define FM10K_EXVET_ETHERTYPE_MASK 0x00FF
-#define FM10K_EXVET_TAG_SIZE_SHIFT 16
-#define FM10K_EXVET_AFTER_VLAN 0x0004
 #define FM10K_GCR  0x0003
-#define FM10K_FACTPS   0x0004
 #define FM10K_GCR_EXT  0x0005

 /* Interrupt control registers */
 #define FM10K_EICR 0x0006
-#define FM10K_EICR_PCA_FAULT   0x0001
-#define FM10K_EICR_THI_FAULT   0x0004
-#define FM10K_EICR_FUM_FAULT   0x0020
 #define FM10K_EICR_FAULT_MASK  0x003F
 #define FM10K_EICR_MAILBOX 0x0040
 #define FM10K_EICR_SWITCHREADY 0x0080
 #define FM10K_EICR_SWITCHNOTREADY  0x0100
 #define FM10K_EICR_SWITCHINTERRUPT 0x0200
-#define FM10K_EICR_SRAMERROR   0x0400
 #define FM10K_EICR_VFLR0x0800
 #define FM10K_EICR_MAXHOLDTIME 0x1000
 #define FM10K_EIMR 0x0007
@@ -196,7 +183,6 @@ struct fm10k_hw

[dpdk-dev] [PATCH v2 15/16] fm10k/base: move constants to the right of binary operators

2016-01-27 Thread Wang Xiao W
The upstream Linux kernel community prefers constants are to the right of
binary operators.

Signed-off-by: Wang Xiao W 
---
 drivers/net/fm10k/base/fm10k_pf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/base/fm10k_pf.c 
b/drivers/net/fm10k/base/fm10k_pf.c
index 456fe64..105babf 100644
--- a/drivers/net/fm10k/base/fm10k_pf.c
+++ b/drivers/net/fm10k/base/fm10k_pf.c
@@ -759,8 +759,8 @@ STATIC s32 fm10k_iov_assign_resources_pf(struct fm10k_hw 
*hw, u16 num_vfs,
FM10K_RXDCTL_WRITE_BACK_MIN_DELAY |
FM10K_RXDCTL_DROP_ON_EMPTY);
FM10K_WRITE_REG(hw, FM10K_RXQCTL(vf_q_idx),
-   FM10K_RXQCTL_VF |
-   (i << FM10K_RXQCTL_VF_SHIFT));
+   (i << FM10K_RXQCTL_VF_SHIFT) |
+   FM10K_RXQCTL_VF);

/* map queue pair to VF */
FM10K_WRITE_REG(hw, FM10K_TQMAP(qmap_idx), vf_q_idx);
@@ -1035,7 +1035,7 @@ STATIC s32 fm10k_iov_reset_resources_pf(struct fm10k_hw 
*hw,
txqctl = ((u32)vf_vid << FM10K_TXQCTL_VID_SHIFT) |
 (vf_idx << FM10K_TXQCTL_TC_SHIFT) |
 FM10K_TXQCTL_VF | vf_idx;
-   rxqctl = FM10K_RXQCTL_VF | (vf_idx << FM10K_RXQCTL_VF_SHIFT);
+   rxqctl = (vf_idx << FM10K_RXQCTL_VF_SHIFT) | FM10K_RXQCTL_VF;

/* stop further DMA and reset queue ownership back to VF */
for (i = vf_q_idx; i < (queues_per_pool + vf_q_idx); i++) {
-- 
1.9.3



[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Xie, Huawei
On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote:
>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
>>> +   if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
>>> +   *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
>>> +   } else {
>>> +   *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
>>> +   }
>> Thanks!
>> We might simplify this further. Just reset the first two fields flags
>> and gso_type.
> What's this "simplification" for? Don't even to say that we will add
> TSO support, which modifies few more files, such as csum_start: reseting
> the first two fields only is wrong here.

I know TSO before commenting, but at least in this implementation and
this specific patch, i guess zeroing two fields are enough.

What is wrong resetting only two fields?

>
>   --yliu
>



[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Yuanhan Liu
On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote:
> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> > On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote:
> >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> >>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> >>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> >>> + } else {
> >>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> >>> + }
> >> Thanks!
> >> We might simplify this further. Just reset the first two fields flags
> >> and gso_type.
> > What's this "simplification" for? Don't even to say that we will add
> > TSO support, which modifies few more files, such as csum_start: reseting
> > the first two fields only is wrong here.
> 
> I know TSO before commenting, but at least in this implementation and
> this specific patch, i guess zeroing two fields are enough.
> 
> What is wrong resetting only two fields?

I then have to ask "What is the benifit of resetting only two fields"?
If doing so, we have to change it back for TSO. My proposal requires no
extra change when adding TSO support.

--yliu


[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst

2016-01-27 Thread Xie, Huawei
On 1/27/2016 11:26 AM, Yuanhan Liu wrote:
> On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote:
>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
>>> Signed-off-by: Yuanhan Liu 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 287 
>>> +-
>>>  1 file changed, 113 insertions(+), 174 deletions(-)
>> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us
> I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not
> to do that, unless it has a good reason.
>
>> processing descriptors in a burst way in future.
> So, do you have a plan?

I think it is OK. If we need unroll in future, we could do that then. I
am open to this. Just my preference. I understand that wrapping makes
code more readable.

>
>   --yliu
>



[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst

2016-01-27 Thread Yuanhan Liu
On Wed, Jan 27, 2016 at 06:12:22AM +, Xie, Huawei wrote:
> On 1/27/2016 11:26 AM, Yuanhan Liu wrote:
> > On Tue, Jan 26, 2016 at 10:30:12AM +, Xie, Huawei wrote:
> >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> >>> Signed-off-by: Yuanhan Liu 
> >>> ---
> >>>  lib/librte_vhost/vhost_rxtx.c | 287 
> >>> +-
> >>>  1 file changed, 113 insertions(+), 174 deletions(-)
> >> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us
> > I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not
> > to do that, unless it has a good reason.
> >
> >> processing descriptors in a burst way in future.
> > So, do you have a plan?
> 
> I think it is OK. If we need unroll in future, we could do that then. I
> am open to this. Just my preference. I understand that wrapping makes
> code more readable.

Okay, let's consider it then: unroll would be easy after all.

--yliu


[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Xie, Huawei
On 1/27/2016 2:02 PM, Yuanhan Liu wrote:
> On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote:
>> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
>>> On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote:
 On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> + } else {
> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> + }
 Thanks!
 We might simplify this further. Just reset the first two fields flags
 and gso_type.
>>> What's this "simplification" for? Don't even to say that we will add
>>> TSO support, which modifies few more files, such as csum_start: reseting
>>> the first two fields only is wrong here.
>> I know TSO before commenting, but at least in this implementation and
>> this specific patch, i guess zeroing two fields are enough.
>>
>> What is wrong resetting only two fields?
> I then have to ask "What is the benifit of resetting only two fields"?
> If doing so, we have to change it back for TSO. My proposal requires no
> extra change when adding TSO support.

? Benefit is we save four unnecessary stores.

>
>   --yliu
>



[dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy

2016-01-27 Thread Yuanhan Liu
On Wed, Jan 27, 2016 at 06:16:37AM +, Xie, Huawei wrote:
> On 1/27/2016 2:02 PM, Yuanhan Liu wrote:
> > On Wed, Jan 27, 2016 at 05:56:56AM +, Xie, Huawei wrote:
> >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> >>> On Wed, Jan 27, 2016 at 02:46:39AM +, Xie, Huawei wrote:
>  On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> > +   if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> > +   *(struct virtio_net_hdr_mrg_rxbuf 
> > *)(uintptr_t)desc_addr = hdr;
> > +   } else {
> > +   *(struct virtio_net_hdr *)(uintptr_t)desc_addr = 
> > hdr.hdr;
> > +   }
>  Thanks!
>  We might simplify this further. Just reset the first two fields flags
>  and gso_type.
> >>> What's this "simplification" for? Don't even to say that we will add
> >>> TSO support, which modifies few more files, such as csum_start: reseting
> >>> the first two fields only is wrong here.
> >> I know TSO before commenting, but at least in this implementation and
> >> this specific patch, i guess zeroing two fields are enough.
> >>
> >> What is wrong resetting only two fields?
> > I then have to ask "What is the benifit of resetting only two fields"?
> > If doing so, we have to change it back for TSO. My proposal requires no
> > extra change when adding TSO support.
> 
> ? Benefit is we save four unnecessary stores.

Hmm..., the hdr size is 12 bytes at most. I mean, does it really matter,
coping 3 bytes, or coping 12 bytes in a row?

--yliu


[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G

2016-01-27 Thread Moon-Sang Lee
Laurent, have you resolved this problem?
I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced the
same problem as you.
Here is parts of my log and it says that PMD cannot enable RX queue for my
NIC.
I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling
rte_eth_rx_queue_setup().
(i.e. 'null' parameter provides the default rx_conf value.)

Thanks.



APP: initialising port 0 ...
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040
sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940
hw_ring=0x7f5f27247980 dma_addr=0x41f447980
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0
hw_ring=0x7f5f27235800 dma_addr=0x41f435800
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst
size no less than 4 (port=0).
*PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0*
APP: port 0 has started
APP: port 0 has entered in promiscuous mode
APP: port 0 initialization is done.
KNI: pci: 09:00:00 8086:10c7
APP: kni allocation is done for port 0.
APP: initialising port 1 ...
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0
sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0
hw_ring=0x7f5f27212700 dma_addr=0x41f412700
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540
hw_ring=0x7f5f27200580 dma_addr=0x41f400580
PMD: ixgbe_set_tx_function(): Using simple tx code path
PMD: ixgbe_set_tx_function(): Vector tx enabled.
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst
size no less than 4 (port=1).
*PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0*
APP: port 1 has started
APP: port 1 has entered in promiscuous mode
APP: port 1 initialization is done.
KNI: pci: 0a:00:00 8086:10c7
APP: kni allocation is done for port 1.

checking link status
.done
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex


On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith  wrote:

> On 12/27/15, 2:09 PM, "Laurent GUERBY"  wrote:
>
> >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote:
> >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY" <
> dev-bounces at dpdk.org on behalf of laurent at guerby.net> wrote:
> >>
> >> >Hi,
> >> >
> >> >I reported today an issue when using Pktgen-DPDK:
> >> >https://github.com/pktgen/Pktgen-DPDK/issues/52
> >> >
> >> >But I think it's more in DPDK than pktgen
> >> >
> >> >two identical machines with SFP+ DA cable between them
> >> >DPDK 2.2.0 from tarball
> >> >Pktgen-DPDK from git
> >> >two identical machines:
> >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS
> >> >ASUS P8H67-M PRO BIOS 3904 (latest available)
> >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port
> >> >Network Connection (rev 01)
> >> >01:00.0 0200: 8086:10f1 (rev 01)
> >> >Subsystem: 8086:a21f
> >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4
> >> >
> >> >When launching pktgen even with no TX asked the Errors RX counters
> keeps
> >> >going up by about 7.4 millions per second:
> >> >
> >> >Errors Rx/Tx : 7471857054/0
> >> >
> >> >In the log I get "Could not enable Rx Queue", might be the
> >> >source of the issue?
> >> >
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 1
> >> >
> >> >When sending traffic  single UDP src/dst/IP/MAC the setup
> >> >reaches 14204188 pps 64 bytes, the error counter is also
> >> >increasing.
> >> >
> >> >Any idea what to look for?
> >>
> >> One more suggestion is to run test_pmd on one machine and something
> >> like iperf on the other to verify the DPDK is working correct, which I
> >> assume will be true. Not sure the RX errors are reported in the
> >> test_pmd or you could use the l3fwd application too.
> >
> >Ok, I will check the test_pmd documentation and try to do this test: I'm
> >just starting on DPDK :).
> >
> >> Please also send me the 'lspci | grep Ethernet? output.
> >
> >I included one line in my original email above (plus extract of lspci
> >-vn), here is the full output of the command:
> >
> >01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF
> >Dual Port Network Connection (rev 01)
> >01:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF
> >Dual Port Network Connection (rev 01)
> >05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> >
> >(The realtek is used only for internet connectivity).
> >
> >> Also send me the command line.
> >
> >On the first machine

[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G

2016-01-27 Thread Laurent GUERBY
On Wed, 2016-01-27 at 15:50 +0900, Moon-Sang Lee wrote:
> 
> 
> Laurent, have you resolved this problem?
> I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced
> the same problem as you.
> Here is parts of my log and it says that PMD cannot enable RX queue
> for my NIC.
> I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling
> rte_eth_rx_queue_setup().
> (i.e. 'null' parameter provides the default rx_conf value.)

Hi,

I had to reuse my DPDK machines for another task,
I will go back to it after FOSDEM.

The error you get is the same as mine.

Sincerely,

Laurent

> 
> Thanks. 
> 
> 
> 
> 
> 
> APP: initialising port 0 ...
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040
> sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940
> hw_ring=0x7f5f27247980 dma_addr=0x41f447980
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0
> hw_ring=0x7f5f27235800 dma_addr=0x41f435800
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX
> burst size no less than 4 (port=0).
> PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0
> APP: port 0 has started
> APP: port 0 has entered in promiscuous mode
> APP: port 0 initialization is done.
> KNI: pci: 09:00:00 8086:10c7
> APP: kni allocation is done for port 0.
> APP: initialising port 1 ...
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0
> sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0
> hw_ring=0x7f5f27212700 dma_addr=0x41f412700
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540
> hw_ring=0x7f5f27200580 dma_addr=0x41f400580
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX
> burst size no less than 4 (port=1).
> PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0
> APP: port 1 has started
> APP: port 1 has entered in promiscuous mode
> APP: port 1 initialization is done.
> KNI: pci: 0a:00:00 8086:10c7
> APP: kni allocation is done for port 1.
> 
> 
> checking link status
> .done
> Port 0 Link Up - speed 1 Mbps - full-duplex
> Port 1 Link Up - speed 1 Mbps - full-duplex
> 
> 
> 
> On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith 
> wrote:
> On 12/27/15, 2:09 PM, "Laurent GUERBY" 
> wrote:
> 
> >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote:
> >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY"
>  wrote:
> >>
> >> >Hi,
> >> >
> >> >I reported today an issue when using Pktgen-DPDK:
> >> >https://github.com/pktgen/Pktgen-DPDK/issues/52
> >> >
> >> >But I think it's more in DPDK than pktgen
> >> >
> >> >two identical machines with SFP+ DA cable between them
> >> >DPDK 2.2.0 from tarball
> >> >Pktgen-DPDK from git
> >> >two identical machines:
> >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS
> >> >ASUS P8H67-M PRO BIOS 3904 (latest available)
> >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit
> AF Dual Port
> >> >Network Connection (rev 01)
> >> >01:00.0 0200: 8086:10f1 (rev 01)
> >> >Subsystem: 8086:a21f
> >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4
> >> >
> >> >When launching pktgen even with no TX asked the Errors RX
> counters keeps
> >> >going up by about 7.4 millions per second:
> >> >
> >> >Errors Rx/Tx : 7471857054/0
> >> >
> >> >In the log I get "Could not enable Rx Queue", might be the
> >> >source of the issue?
> >> >
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue
> 0
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue
> 1
> >> >
> >> >When sending traffic  single UDP src/dst/IP/MAC the setup
> >> >reaches 14204188 pps 64 bytes, the error counter is also
> >> >increasing.
> >> >
> >> >Any idea what to look for?
> >>
> >> One more suggestion is to run test_pmd on one machine and
> something
> >> like iperf on the other to verify the DPDK is working
> correct, which I
> >> assume will be true. Not sure the RX errors are reported in
> the
> >> test_pmd or you could use the l3fwd application too.
> >
> >Ok, I will check the test_pmd documentat

[dpdk-dev] Errors Rx count increasing while pktgen doing nothing on Intel 82598EB 10G

2016-01-27 Thread Zhang, Helin
Moon-Sang

Were you using pktgen or else application?
Could you help to share with me the detailed steps of your reproducing that 
issue?
We will find time on that soon later. Thanks!

Regards,
Helin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Laurent GUERBY
Sent: Wednesday, January 27, 2016 3:16 PM
To: Moon-Sang Lee 
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Errors Rx count increasing while pktgen doing nothing 
on Intel 82598EB 10G

On Wed, 2016-01-27 at 15:50 +0900, Moon-Sang Lee wrote:
> 
> 
> Laurent, have you resolved this problem?
> I'm using the same NIC as yours (i.e. Intel 82598EB 10G NIC) and faced 
> the same problem as you.
> Here is parts of my log and it says that PMD cannot enable RX queue 
> for my NIC.
> I'm using DPDK 2.2.0 and used 'null' for the 4th parameter in calling 
> rte_eth_rx_queue_setup().
> (i.e. 'null' parameter provides the default rx_conf value.)

Hi,

I had to reuse my DPDK machines for another task, I will go back to it after 
FOSDEM.

The error you get is the same as mine.

Sincerely,

Laurent

> 
> Thanks. 
> 
> 
> 
> 
> 
> APP: initialising port 0 ...
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27258040
> sw_sc_ring=0x7f5f27257b00 hw_ring=0x7f5f27258580 dma_addr=0x41f458580
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f27245940
> hw_ring=0x7f5f27247980 dma_addr=0x41f447980
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272337c0
> hw_ring=0x7f5f27235800 dma_addr=0x41f435800
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX 
> burst size no less than 4 (port=0).
> PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0
> APP: port 0 has started
> APP: port 0 has entered in promiscuous mode
> APP: port 0 initialization is done.
> KNI: pci: 09:00:00 8086:10c7
> APP: kni allocation is done for port 0.
> APP: initialising port 1 ...
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f5f27222dc0
> sw_sc_ring=0x7f5f27222880 hw_ring=0x7f5f27223300 dma_addr=0x41f423300
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f272106c0
> hw_ring=0x7f5f27212700 dma_addr=0x41f412700
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f5f271fe540
> hw_ring=0x7f5f27200580 dma_addr=0x41f400580
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX 
> burst size no less than 4 (port=1).
> PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue 0
> APP: port 1 has started
> APP: port 1 has entered in promiscuous mode
> APP: port 1 initialization is done.
> KNI: pci: 0a:00:00 8086:10c7
> APP: kni allocation is done for port 1.
> 
> 
> checking link status
> .done
> Port 0 Link Up - speed 1 Mbps - full-duplex Port 1 Link Up - speed 
> 1 Mbps - full-duplex
> 
> 
> 
> On Mon, Dec 28, 2015 at 5:28 AM, Wiles, Keith 
> wrote:
> On 12/27/15, 2:09 PM, "Laurent GUERBY" 
> wrote:
> 
> >On Sun, 2015-12-27 at 19:43 +, Wiles, Keith wrote:
> >> On 12/27/15, 12:31 PM, "dev on behalf of Laurent GUERBY"
>  wrote:
> >>
> >> >Hi,
> >> >
> >> >I reported today an issue when using Pktgen-DPDK:
> >> >https://github.com/pktgen/Pktgen-DPDK/issues/52
> >> >
> >> >But I think it's more in DPDK than pktgen
> >> >
> >> >two identical machines with SFP+ DA cable between them
> >> >DPDK 2.2.0 from tarball
> >> >Pktgen-DPDK from git
> >> >two identical machines:
> >> >core i7 2600 (sandy bridge 4C/8T), HT disabled in the BIOS
> >> >ASUS P8H67-M PRO BIOS 3904 (latest available)
> >> >Ethernet controller: Intel Corporation 82598EB 10-Gigabit
> AF Dual Port
> >> >Network Connection (rev 01)
> >> >01:00.0 0200: 8086:10f1 (rev 01)
> >> >Subsystem: 8086:a21f
> >> >boot kernel 3.16 unbutu 14.04 with isolcpus=2,3,4
> >> >
> >> >When launching pktgen even with no TX asked the Errors RX
> counters keeps
> >> >going up by about 7.4 millions per second:
> >> >
> >> >Errors Rx/Tx : 7471857054/0
> >> >
> >> >In the log I get "Could not enable Rx Queue", might be the
> >> >source of the issue?
> >> >
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue
> 0
> >> >PMD: ixgbe_dev_rx_queue_start(): Could not enable Rx Queue
> 1
> >> >
> >> >When sending traffic  single UDP src/dst/IP/MAC the setup
> >> >reaches 14204188 pps 64 bytes, the error counter is also
> >

[dpdk-dev] bnx2x driver and 57800 versus 57810

2016-01-27 Thread Harish Patil
>
>I have to practically identical systems, same hypervisor on each (Centos
>7.x).  In one, I have a 57800 card which works fine with DPDK with
>SRIOV.  In the other, I have a 57810 card which doesn't work with SRIOV.
>
>For the 57810 I have tracked this down to the status block in the VF
>failing to be updated.  The linux driver works fine but it appears to
>use a slightly different scheme -- writing some sort of fastpath status
>block generation per interrupt.
>
>Does anyone have any suggestions or a programming guide for this device?
>
>

What is not working with 57810? Is it link related or traffic? Please
provide the details.
Attached is the SW programming guide for 577xx/578xx. I?m not sure if it
has details pertaining to the specific issue that you have.

Thanks,
Harish


FYI- I had replied to your email earlier with the doc attached but it did
not go thru? yet due to size restrictions.

Your mail to 'dev' with the subject

Re: [dpdk-dev] bnx2x driver and 57800 versus 57810

Is being held until the list moderator can review it for approval.

The reason it is being held:

Message body is too big: 1350322 bytes with a limit of 300 KB




This message and any attached documents contain information from the sending 
company or its parent company(s), subsidiaries, divisions or branch offices 
that may be confidential. If you are not the intended recipient, you may not 
read, copy, distribute, or use this information. If you have received this 
transmission in error, please notify the sender immediately by reply e-mail and 
then delete this message.


[dpdk-dev] [PATCH v5 8/9] virtio: add 1.0 support

2016-01-27 Thread Thomas Monjalon
2016-01-27 11:46, Yuanhan Liu:
> On Thu, Jan 21, 2016 at 12:49:10PM +0100, Thomas Monjalon wrote:
> > 2016-01-19 16:12, Yuanhan Liu:
> > >  int
> > >  vtpci_init(struct rte_pci_device *dev, struct virtio_hw *hw)
> > >  {
> > > -   hw->vtpci_ops = &legacy_ops;
> > > +   hw->dev = dev;
> > > +
> > > +   /*
> > > +* Try if we can succeed reading virtio pci caps, which exists
> > > +* only on modern pci device. If failed, we fallback to legacy
> > > +* virtio handling.
> > > +*/
> > > +   if (virtio_read_caps(dev, hw) == 0) {
> > > +   PMD_INIT_LOG(INFO, "modern virtio pci detected.");
> > > +   hw->vtpci_ops = &modern_ops;
> > > +   hw->modern= 1;
> > > +   dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
> > > +   return 0;
> > > +   }
> > 
> > RTE_PCI_DRV_INTR_LSC is already set by virtio_resource_init_by_uio().
> 
> We don't go that far here. Here we just detect if it's a modern virtio
> device. And if yes, we do some modern initiations, and return.
> 
> virtio_resource_init_by_uio() is invoked when virtio_read_caps() fails.
> 
> > Do you mean interrupt was not supported with legacy virtio?
> 
> Nope. this patch set changes nothing on legacy virtio support.

Oh yes. I guess I had not seen the return.


[dpdk-dev] [PATCH v2 2/2] i40evf: support interrupt based pf reset request

2016-01-27 Thread David Marchand
Hello Jingjing,

On Wed, Jan 27, 2016 at 2:49 AM, Jingjing Wu  wrote:
> Interrupt based request of PF reset from PF is supported by
> enabling the adminq event process in VF driver.
> Users can register a callback for this interrupt event to get
> informed, when a PF reset request detected like:
>   rte_eth_dev_callback_register(portid,
> RTE_ETH_EVENT_INTR_RESET,
> reset_event_callback,
> arg);
>
> Signed-off-by: Jingjing Wu 

Just adding my previous comment in this thread.

Having this infrastructure is one thing, but the initial problem was
that the driver did not recover from this reset event.
The linux i40e vf driver handles this kind of event itself.
Could we have something similar ?

Thanks.

-- 
David Marchand


[dpdk-dev] [PATCH] ethdev: fix byte order inconsistence between fdir flow and mask

2016-01-27 Thread Jingjing Wu
Fixed issue of byte order in ethdev library that the structure
for setting fdir's mask and flow entry is inconsist and made
inputs of mask be in big endian.

fixes: 76c6f89e80d4 ("ixgbe: support new flow director masks")
   2d4c1a9ea2ac ("ethdev: add new flow director masks")

Reported-by: Yaacov Hazan 
Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c   |  6 ++---
 doc/guides/rel_notes/release_2_3.rst |  6 +
 drivers/net/ixgbe/ixgbe_fdir.c   | 47 ++--
 lib/librte_ether/rte_eth_ctrl.h  |  7 --
 4 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..13194c9 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8687,13 +8687,13 @@ cmd_flow_director_mask_parsed(void *parsed_result,
return;
}

-   mask->vlan_tci_mask = res->vlan_mask;
+   mask->vlan_tci_mask = rte_cpu_to_be_16(res->vlan_mask);
IPV4_ADDR_TO_UINT(res->ipv4_src, mask->ipv4_mask.src_ip);
IPV4_ADDR_TO_UINT(res->ipv4_dst, mask->ipv4_mask.dst_ip);
IPV6_ADDR_TO_ARRAY(res->ipv6_src, mask->ipv6_mask.src_ip);
IPV6_ADDR_TO_ARRAY(res->ipv6_dst, mask->ipv6_mask.dst_ip);
-   mask->src_port_mask = res->port_src;
-   mask->dst_port_mask = res->port_dst;
+   mask->src_port_mask = rte_cpu_to_be_16(res->port_src);
+   mask->dst_port_mask = rte_cpu_to_be_16(res->port_dst);
}

cmd_reconfig_device_queue(res->port_id, 1, 1);
diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..28d0f27 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -19,6 +19,10 @@ Drivers
 Libraries
 ~

+* ** fix byte order inconsistence between fdir flow and mask **
+
+  Fixed issue in ethdev library that the structure for setting
+  fdir's mask and flow entry is inconsist in byte order.

 Examples
 
@@ -39,6 +43,8 @@ API Changes
 ABI Changes
 ---

+* The fields in  The ethdev structures ``rte_eth_fdir_masks`` were
+  changed to be in big endian.

 Shared Library Versions
 ---
diff --git a/drivers/net/ixgbe/ixgbe_fdir.c b/drivers/net/ixgbe/ixgbe_fdir.c
index e03219b..7423b2d 100644
--- a/drivers/net/ixgbe/ixgbe_fdir.c
+++ b/drivers/net/ixgbe/ixgbe_fdir.c
@@ -309,6 +309,7 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev,
uint32_t fdiripv6m; /* IPv6 source and destination masks. */
uint16_t dst_ipv6m = 0;
uint16_t src_ipv6m = 0;
+   volatile uint32_t *reg;

PMD_INIT_FUNC_TRACE();

@@ -322,16 +323,16 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev,
/* use the L4 protocol mask for raw IPv4/IPv6 traffic */
fdirm |= IXGBE_FDIRM_L4P;

-   if (input_mask->vlan_tci_mask == 0x0FFF)
+   if (input_mask->vlan_tci_mask == rte_cpu_to_be_16(0x0FFF))
/* mask VLAN Priority */
fdirm |= IXGBE_FDIRM_VLANP;
-   else if (input_mask->vlan_tci_mask == 0xE000)
+   else if (input_mask->vlan_tci_mask == rte_cpu_to_be_16(0xE000))
/* mask VLAN ID */
fdirm |= IXGBE_FDIRM_VLANID;
else if (input_mask->vlan_tci_mask == 0)
/* mask VLAN ID and Priority */
fdirm |= IXGBE_FDIRM_VLANID | IXGBE_FDIRM_VLANP;
-   else if (input_mask->vlan_tci_mask != 0xEFFF) {
+   else if (input_mask->vlan_tci_mask != rte_cpu_to_be_16(0xEFFF)) {
PMD_INIT_LOG(ERR, "invalid vlan_tci_mask");
return -EINVAL;
}
@@ -340,19 +341,26 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev,
IXGBE_WRITE_REG(hw, IXGBE_FDIRM, fdirm);

/* store the TCP/UDP port masks, bit reversed from port layout */
-   fdirtcpm = reverse_fdir_bitmasks(input_mask->dst_port_mask,
-input_mask->src_port_mask);
+   fdirtcpm = reverse_fdir_bitmasks(
+   rte_be_to_cpu_16(input_mask->dst_port_mask),
+   rte_be_to_cpu_16(input_mask->src_port_mask));

-   /* write all the same so that UDP, TCP and SCTP use the same mask */
+   /* write all the same so that UDP, TCP and SCTP use the same mask
+* (little-endian)
+   */
IXGBE_WRITE_REG(hw, IXGBE_FDIRTCPM, ~fdirtcpm);
IXGBE_WRITE_REG(hw, IXGBE_FDIRUDPM, ~fdirtcpm);
IXGBE_WRITE_REG(hw, IXGBE_FDIRSCTPM, ~fdirtcpm);
info->mask.src_port_mask = input_mask->src_port_mask;
info->mask.dst_port_mask = input_mask->dst_port_mask;

-   /* Store source and destination IPv4 masks (big-endian) */
-   IXGBE_WRITE_REG(hw, IXGBE_FDIRSIP4M, ~(input_mask->ipv4_mask.src_ip));
-   IXGBE_WRITE_REG(hw, IXGBE_FDIRDIP4M, ~(input_mask->ipv4_mask.dst_ip));
+   /* Store source an

[dpdk-dev] [PATCH v6 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue

2016-01-27 Thread Huawei Xie
v6 changes:
 reflect the changes in release notes and library version map file
 revise our duff's code style a bit to make it more readable

v5 changes:
 add comment about duff's device and our variant implementation

v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop in rte_pktmbuf_alloc_bulk to help the performance

For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios
their mbufs are all simple mbufs, i.e meet the following requirements:
 * no multiple segments
 * not indirect mbuf
 * refcnt is 1
 * belong to the same mbuf memory pool,
it could directly call rte_mempool_put to free the bulk of mbufs,
otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free
the mbuf one by one.
This patchset will not provide this symmetric implementation.

Huawei Xie (2):
  mbuf: provide rte_pktmbuf_alloc_bulk API
  vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

 doc/guides/rel_notes/release_2_3.rst |  3 ++
 lib/librte_mbuf/rte_mbuf.h   | 55 
 lib/librte_mbuf/rte_mbuf_version.map |  7 +
 lib/librte_vhost/vhost_rxtx.c| 35 ++-
 4 files changed, 87 insertions(+), 13 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2016-01-27 Thread Huawei Xie
v6 changes:
 reflect the changes in release notes and library version map file
 revise our duff's code style a bit to make it more readable

v5 changes:
 add comment about duff's device and our variant implementation

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop a bit to help the performance

rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.

There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.

Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device

In our implementation, we use while() loop rather than do{} while() loop
because we could not assume count is strictly positive. Using while()
loop saves one line of check if count is zero.

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
---
 doc/guides/rel_notes/release_2_3.rst |  3 ++
 lib/librte_mbuf/rte_mbuf.h   | 55 
 lib/librte_mbuf/rte_mbuf_version.map |  7 +
 3 files changed, 65 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..a52cba3 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
 New Features
 

+* **Enable bulk allocation of mbufs. **
+  A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the user
+  to allocate a bulk of mbufs.

 Resolved Issues
 ---
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..b2ed479 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1336,6 +1336,61 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct 
rte_mempool *mp)
 }

 /**
+ * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default
+ * values.
+ *
+ *  @param pool
+ *The mempool from which mbufs are allocated.
+ *  @param mbufs
+ *Array of pointers to mbufs
+ *  @param count
+ *Array size
+ *  @return
+ *   - 0: Success
+ */
+static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
+struct rte_mbuf **mbufs, unsigned count)
+{
+   unsigned idx = 0;
+   int rc;
+
+   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+   if (unlikely(rc))
+   return rc;
+
+   /* To understand duff's device on loop unwinding optimization, see
+* https://en.wikipedia.org/wiki/Duff's_device.
+* Here while() loop is used rather than do() while{} to avoid extra
+* check if count is zero.
+*/
+   switch (count % 4) {
+   case 0:
+   while (idx != count) {
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 3:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 2:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 1:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   }
+   }
+   return 0;
+}
+
+/**
  * Attach packet mbuf to another packet mbuf.
  *
  * After attachment we refer the mbuf we attached as 'indirect',
diff --git a/lib/librte_mbuf/rte_mbuf_version.map 
b/lib/librte_mbuf/rte_mbuf_version.map
index e10f6bd..257c65a 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -18,3 +18,10 @@ DPDK_2.1 {
rte_pktmbuf_pool_create;

 } DPDK_2.0;
+
+DPDK_2.3 {
+   global:
+
+   rte_pktmbuf_alloc_bulk;
+
+} DPDK_2.1;
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2016-01-27 Thread Huawei Xie
v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails
reported by haifeng

pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
Acked-by: Yuanhan Liu 
Tested-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_rxtx.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index bbf3fac..f10d534 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t i;
uint16_t free_entries, entry_success = 0;
uint16_t avail_idx;
+   uint8_t alloc_err = 0;
+   uint8_t seg_num;

if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
RTE_LOG(ERR, VHOST_DATA,
@@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",
dev->device_fh, free_entries);
+
+   if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool,
+   pkts, free_entries)) < 0) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "Failed to bulk allocating %d mbufs\n", free_entries);
+   return 0;
+   }
+
/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < free_entries; i++)
head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 
1)];
@@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t vb_avail, vb_offset;
uint32_t seg_avail, seg_offset;
uint32_t cpy_len;
-   uint32_t seg_num = 0;
+   seg_num = 0;
struct rte_mbuf *cur;
-   uint8_t alloc_err = 0;
+

desc = &vq->desc[head[entry_success]];

@@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->ring[used_idx].id = head[entry_success];
vq->used->ring[used_idx].len = 0;

-   /* Allocate an mbuf and populate the structure. */
-   m = rte_pktmbuf_alloc(mbuf_pool);
-   if (unlikely(m == NULL)) {
-   RTE_LOG(ERR, VHOST_DATA,
-   "Failed to allocate memory for mbuf.\n");
-   break;
-   }
+   prev = cur = m = pkts[entry_success];
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
cpy_len = RTE_MIN(vb_avail, seg_avail);
@@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0);

seg_num++;
-   cur = m;
-   prev = m;
while (cpy_len != 0) {
rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, 
seg_offset),
(void *)((uintptr_t)(vb_addr + vb_offset)),
@@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
cpy_len = RTE_MIN(vb_avail, seg_avail);
}

-   if (unlikely(alloc_err == 1))
+   if (unlikely(alloc_err))
break;

m->nb_segs = seg_num;

-   pkts[entry_success] = m;
vq->last_used_idx++;
entry_success++;
}

+   if (unlikely(alloc_err)) {
+   uint16_t i = entry_success;
+
+   m->nb_segs = seg_num;
+   for (; i < free_entries; i++)
+   rte_pktmbuf_free(pkts[i]);
+   }
+
rte_compiler_barrier();
vq->used->idx += entry_success;
/* Kick guest if required. */
-- 
1.8.1.4



[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-27 Thread Thomas Monjalon
Hi Anatoly,

Few small comments.

The comments "function pointer typedef" or "structure to hold" don't
bring new information. Please keep it short.

2016-01-13 12:36, Anatoly Burakov:
> +/* function pointer typedef for DMA mapping functions */

->  DMA mapping function type
It would be relevant to describe the return and the parameter.

> +typedef  int (*vfio_dma_func_t)(int);
> +
> +/* Structure to hold supported IOMMU types */

This comment seems useless.

> +struct vfio_iommu_type {

[...]
> +/* function prototypes for different IOMMU types */

idem

> +int vfio_iommu_type1_dma_map(int container_fd);
> +int vfio_iommu_noiommu_dma_map(int container_fd);
> +
> +/* IOMMU types we support */
> +static const struct vfio_iommu_type iommu_types[] = {
> + /* x86 IOMMU, otherwise known as type 1 */
> + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
> + /* IOMMU-less mode */
> + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
> +};

[...]
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

Why a new file for these functions?



[dpdk-dev] [PATCH] ethdev: fix byte order inconsistence between fdir flow and mask

2016-01-27 Thread Thomas Monjalon
2016-01-27 16:37, Jingjing Wu:
> Fixed issue of byte order in ethdev library that the structure
> for setting fdir's mask and flow entry is inconsist and made
> inputs of mask be in big endian.

Please be more precise. Which one is big endian?
Wasn't it tested before?

> fixes: 76c6f89e80d4 ("ixgbe: support new flow director masks")
>2d4c1a9ea2ac ("ethdev: add new flow director masks")

Please put Fixes: on the two lines.

> --- a/doc/guides/rel_notes/release_2_3.rst
> +++ b/doc/guides/rel_notes/release_2_3.rst
> @@ -19,6 +19,10 @@ Drivers
>  Libraries
>  ~
>  
> +* ** fix byte order inconsistence between fdir flow and mask **
> +
> +  Fixed issue in ethdev library that the structure for setting
> +  fdir's mask and flow entry is inconsist in byte order.

John, comment on release notes formatting?
It's important to have the first items well formatted.

> @@ -39,6 +43,8 @@ API Changes
>  ABI Changes
>  ---
>  
> +* The fields in  The ethdev structures ``rte_eth_fdir_masks`` were
> +  changed to be in big endian.

Please take care of uppercase typo here.

> - /* write all the same so that UDP, TCP and SCTP use the same mask */
> + /* write all the same so that UDP, TCP and SCTP use the same mask
> +  * (little-endian)
> + */

Spacing typo here.
Sorry for the nits ;)

> - uint8_t mac_addr_byte_mask;  /** Per byte MAC address mask */
> + uint8_t mac_addr_byte_mask;  /** Bit mask for associated byte */
>   uint32_t tunnel_id_mask;  /** tunnel ID mask */
> - uint8_t tunnel_type_mask;
> + uint8_t tunnel_type_mask; /**< 1 - Match tunnel type,
> +0 - Ignore tunnel type. */

These changes seem unrelated with the patch.
It's good to improve doc of this API but it's maybe not enough.
Example:
uint8_t mac_addr_byte_mask;  /** Bit mask for associated byte */
Are we sure everybody understand how to fill it?


[dpdk-dev] [PATCH v2] fix checkpatch errors

2016-01-27 Thread Huawei Xie
v2 changes:
 add missed commit message in v1

fix the error reported by checkpatch:
 "ERROR: return is not a function, parentheses are not required"

also removed other extra parentheses like:
 "return val == 0"
 "return (rte_mempool_lookup(...))"

Signed-off-by: Huawei Xie 
---
 app/test-pmd/cmdline.c | 12 ++--
 app/test-pmd/config.c  |  2 +-
 app/test-pmd/flowgen.c |  2 +-
 app/test-pmd/mempool_anon.c| 12 ++--
 app/test-pmd/testpmd.h |  2 +-
 app/test-pmd/txonly.c  |  2 +-
 app/test/test_mbuf.c   | 12 ++--
 app/test/test_memcpy_perf.c|  4 +-
 app/test/test_mempool.c|  4 +-
 app/test/test_memzone.c| 24 +++
 app/test/test_red.c| 42 ++--
 app/test/test_ring.c   |  4 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c |  2 +-
 drivers/crypto/qat/qat_qp.c| 22 +++---
 drivers/net/bnx2x/bnx2x.c  | 34 -
 drivers/net/bnx2x/bnx2x.h  |  4 +-
 drivers/net/bnx2x/bnx2x_rxtx.c | 16 ++---
 drivers/net/bnx2x/debug.c  |  6 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  2 +-
 drivers/net/e1000/em_ethdev.c  | 40 +--
 drivers/net/e1000/em_rxtx.c| 46 ++---
 drivers/net/e1000/igb_ethdev.c | 18 ++---
 drivers/net/e1000/igb_rxtx.c   | 30 
 drivers/net/fm10k/fm10k_ethdev.c   | 40 +--
 drivers/net/i40e/i40e_ethdev.c |  2 +-
 drivers/net/i40e/i40e_ethdev.h |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c  |  2 +-
 drivers/net/i40e/i40e_rxtx.c   | 14 ++--
 drivers/net/ixgbe/ixgbe_82599_bypass.c |  4 +-
 drivers/net/ixgbe/ixgbe_bypass.c   |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   | 34 -
 drivers/net/ixgbe/ixgbe_rxtx.c | 36 +-
 drivers/net/mlx5/mlx5_utils.h  |  2 +-
 drivers/net/mpipe/mpipe_tilegx.c   |  4 +-
 drivers/net/nfp/nfp_net.c  | 16 ++---
 drivers/net/virtio/virtio_ethdev.c |  6 +-
 examples/ip_pipeline/cpu_core_map.c|  2 +-
 .../pipeline/pipeline_flow_actions_be.c|  2 +-
 examples/ip_reassembly/main.c  | 22 +++---
 examples/ipv4_multicast/main.c | 14 ++--
 examples/l3fwd/main.c  |  4 +-
 examples/multi_process/symmetric_mp/main.c |  2 +-
 examples/netmap_compat/bridge/bridge.c |  8 +--
 examples/netmap_compat/lib/compat_netmap.c | 80 +++---
 examples/qos_sched/args.c  |  2 +-
 examples/quota_watermark/qw/main.h |  2 +-
 examples/vhost/main.c  |  4 +-
 examples/vhost_xen/main.c  |  2 +-
 examples/vhost_xen/vhost_monitor.c |  6 +-
 lib/librte_acl/acl_run_neon.h  |  2 +-
 lib/librte_cryptodev/rte_cryptodev.c   | 22 +++---
 lib/librte_eal/common/eal_common_memzone.c |  2 +-
 .../common/include/arch/ppc_64/rte_byteorder.h |  2 +-
 lib/librte_eal/common/malloc_heap.c|  2 +-
 lib/librte_eal/linuxapp/eal/eal_xen_memory.c   |  2 +-
 lib/librte_eal/linuxapp/kni/kni_vhost.c|  2 +-
 lib/librte_ether/rte_ether.h   | 10 +--
 lib/librte_hash/rte_cuckoo_hash.c  | 18 ++---
 lib/librte_ip_frag/ip_frag_internal.c  |  4 +-
 lib/librte_lpm/rte_lpm.c   |  2 +-
 lib/librte_mempool/rte_mempool.h   |  2 +-
 lib/librte_ring/rte_ring.h |  6 +-
 lib/librte_sched/rte_bitmap.h  |  6 +-
 lib/librte_sched/rte_red.h |  2 +-
 lib/librte_sched/rte_sched.c   |  4 +-
 65 files changed, 372 insertions(+), 372 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..a82682d 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2418,11 +2418,11 @@ parse_item_list(char* str, const char* item_name, 
unsigned int max_items,
}
if (c != ',') {
printf("character %c is not a decimal digit\n", c);
-   return (0);
+   return 0;
}
if (! value_ok) {
printf("No valid value before comma\n");
-   return (0);
+   return 0;

[dpdk-dev] [PATCH] log: add missing symbol

2016-01-27 Thread Thomas Monjalon
2015-12-16 16:38, Stephen Hemminger:
> rte_get_log_type and rte_get_log_level functions has been avaliable
> for many versions. But they are missing from the shared library map
> and therefore do not get exported correctly.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
>  1 file changed, 2 insertions(+)

Why only in linuxapp?

> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
> b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index cbe175f..51a241c 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -93,7 +93,9 @@ DPDK_2.0 {
>   rte_realloc;
>   rte_set_application_usage_hook;
>   rte_set_log_level;
> + rte_get_log_level;
>   rte_set_log_type;
> + rte_get_log_type;

We try to keep an alphabetical order :)



[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment

2016-01-27 Thread Xie, Huawei
On 1/26/2016 10:58 AM, Tetsuya Mukawa wrote:
> On 2016/01/25 19:15, Xie, Huawei wrote:
>> On 1/22/2016 6:38 PM, Tetsuya Mukawa wrote:
>>> On 2016/01/22 17:14, Xie, Huawei wrote:
 On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:
> virtio: Extend virtio-net PMD to support container environment
>
> The patch adds a new virtio-net PMD configuration that allows the PMD to
> work on host as if the PMD is in VM.
> Here is new configuration for virtio-net PMD.
>  - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
> To use this mode, EAL needs physically contiguous memory. To allocate
> such memory, add "--shm" option to application command line.
>
> To prepare virtio-net device on host, the users need to invoke QEMU
> process in special qtest mode. This mode is mainly used for testing QEMU
> devices from outer process. In this mode, no guest runs.
> Here is QEMU command line.
>
>  $ qemu-system-x86_64 \
>  -machine pc-i440fx-1.4,accel=qtest \
>  -display none -qtest-log /dev/null \
>  -qtest unix:/tmp/socket,server \
>  -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1\
>  -device virtio-net-pci,netdev=net0,mq=on \
>  -chardev socket,id=chr1,path=/tmp/ivshmem,server \
>  -device ivshmem,size=1G,chardev=chr1,vectors=1
>
>  * QEMU process is needed per port.
 Does qtest supports hot plug virtio-net pci device, so that we could run
 one QEMU process in host, which provisions the virtio-net virtual
 devices for the container?
>>> Theoretically, we can use hot plug in some cases.
>>> But I guess we have 3 concerns here.
>>>
>>> 1. Security.
>>> If we share QEMU process between multiple DPDK applications, this QEMU
>>> process will have all fds of  the applications on different containers.
>>> In some cases, it will be security concern.
>>> So, I guess we need to support current 1:1 configuration at least.
>>>
>>> 2. shared memory.
>>> Currently, QEMU and DPDK application will map shared memory using same
>>> virtual address.
>>> So if multiple DPDK application connects to one QEMU process, each DPDK
>>> application should have different address for shared memory. I guess
>>> this will be a big limitation.
>>>
>>> 3. PCI bridge.
>>> So far, QEMU has one PCI bridge, so we can connect almost 10 PCI devices
>>> to QEMU.
>>> (I forget correct number, but it's almost 10, because some slots are
>>> reserved by QEMU)
>>> A DPDK application needs both virtio-net and ivshmem device, so I guess
>>> almost 5 DPDK applications can connect to one QEMU process, so far.
>>> To add more PCI bridges solves this.
>>> But we need to add a lot of implementation to support cascaded PCI
>>> bridges and PCI devices.
>>> (Also we need to solve above "2nd" concern.)
>>>
>>> Anyway, if we use virtio-net PMD and vhost-user PMD, QEMU process will
>>> not do anything after initialization.
>>> (QEMU will try to read a qtest socket, then be stopped because there is
>>> no message after initialization)
>>> So I guess we can ignore overhead of these QEMU processes.
>>> If someone cannot ignore it, I guess this is the one of cases that it's
>>> nice to use your light weight container implementation.
>> Thanks for the explanation, and also in your opinion where is the best
>> place to run the QEMU instance? If we run QEMU instances in host, for
>> vhost-kernel support, we could get rid of the root privilege issue.
> Do you mean below?
> If we deploy QEMU instance on host, we can start a container without the
> root privilege.
> (But on host, still QEMU instance needs the privilege to access to
> vhost-kernel)

There is no issue running QEMU instance with root privilege on host, but
i think it is not acceptable granting the container root privilege.

>
> If so, I agree to deploy QEMU instance on host or other privileged
> container will be nice.
> In the case of vhost-user, to deploy on host or non-privileged container
> will be good.
>
>> Another issue is do you plan to support multiple virtio devices in
>> container? Currently i find the code assuming only one virtio-net device
>> in QEMU, right?
> Yes, so far, 1 port needs 1 QEMU instance.
> So if you need multiple virtio devices, you need to invoke multiple QEMU
> instances.
>
> Do you want to deploy 1 QEMU instance for each DPDK application, even if
> the application has multiple virtio-net ports?
>
> So far, I am not sure whether we need it, because this type of DPDK
> application will need only one port in most cases.
> But if you need this, yes, I can implement using QEMU PCI hotplug feature.
> (But probably we can only attach almost 10 ports. This will be limitation.)

I am OK with supporting one virtio device for the first version.

>
>> Btw, i have read most of your qtest code. No obvious issues found so far
>> but quite a couple of nits. You must have spent a lot of time on this.
>> It is great work!
> I appreciat

[dpdk-dev] [PATCH v2 1/2] ethdev: remove useless null checks

2016-01-27 Thread David Marchand
On Tue, Jan 26, 2016 at 4:50 PM, Jan Viktorin  
wrote:
> What about the RTE_VERIFY? I think, it's more appropriate here.

Well, here, I am removing useless checks in static functions.

But for the rest of ethdev api, I agree we could add some RTE_VERIFY.

> Otherwise, feel free to add:
>
> Reviewed-by: Jan Viktorin 

Thanks.


-- 
David Marchand


[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment

2016-01-27 Thread Xie, Huawei
On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:
> + /* Set BAR region */
> + for (i = 0; i < NB_BAR; i++) {
> + switch (dev->bar[i].type) {
> + case QTEST_PCI_BAR_IO:
> + case QTEST_PCI_BAR_MEMORY_UNDER_1MB:
> + case QTEST_PCI_BAR_MEMORY_32:
> + qtest_pci_outl(s, bus, device, 0, dev->bar[i].addr,
> + dev->bar[i].region_start);
> + PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 
> 0x%lx\n",
> + dev->name, dev->bar[i].region_start,
> + dev->bar[i].region_start + 
> dev->bar[i].region_size);
> + break;
> + case QTEST_PCI_BAR_MEMORY_64:
> + qtest_pci_outq(s, bus, device, 0, dev->bar[i].addr,
> + dev->bar[i].region_start);
> + PMD_DRV_LOG(INFO, "Set BAR of %s device: 0x%lx - 
> 0x%lx\n",
> + dev->name, dev->bar[i].region_start,
> + dev->bar[i].region_start + 
> dev->bar[i].region_size);
> + break;

Hasn't the bar resource already been allocated? Is it the app's
responsibility to allocate the bar resource in qtest mode? The app
couldn't have that knowledge.

> + case QTEST_PCI_BAR_DISABLE:
> + break;
> + }
> + }
> +



[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-27 Thread Burakov, Anatoly
Hi Thomas,

> The comments "function pointer typedef" or "structure to hold" don't
> bring new information. Please keep it short.

I'll fix that and submit a v3, thanks.

> Why a new file for these functions?

Well, my thought was to make future extensions easier by way of avoiding mixing 
irrelevant and/or general code with driver-specific code. I can change it back 
if that's not OK.

Thanks,
Anatoly


[dpdk-dev] [PATCH] ip_pipeline: add load balancing function to pass-through pipeline

2016-01-27 Thread Jasvinder Singh
The passthrough pipeline implementation is extended with load balancing
function. This function allows uniform distribution of the packets among
its output ports. For packets distribution, any application level logic
can be applied. For instance, in this implementation, hash value
computed over specific header fields of the incoming packets has been
used to spread traffic uniformly among the output ports. The following
passthrough configuration can be used for implementing load balancing
function over ipv4 traffic;

[PIPELINE0]
type = PASS-THROUGH
core = 0
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
dma_src_offset = 278; mbuf (128) + headroom (128) + 1st ethertype offset (14) + 
ttl offset within ip header = 278 (ipv4)
dma_dst_offset = 128; mbuf (128)
dma_size = 16
dma_src_mask = 00FF
dma_hash_offset = 144; (dma_dst_offset+dma_size)
lb = hash

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 .../ip_pipeline/pipeline/pipeline_actions_common.h |  22 ++
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c | 281 -
 .../ip_pipeline/pipeline/pipeline_passthrough_be.h |   2 +
 3 files changed, 245 insertions(+), 60 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_actions_common.h 
b/examples/ip_pipeline/pipeline/pipeline_actions_common.h
index 9958758..2c08db2 100644
--- a/examples/ip_pipeline/pipeline/pipeline_actions_common.h
+++ b/examples/ip_pipeline/pipeline/pipeline_actions_common.h
@@ -59,6 +59,28 @@ f_ah(
\
return 0;   \
 }

+#define PIPELINE_PORT_IN_AH_LB(f_ah, f_pkt_work, f_pkt4_work) \
+static int \
+f_ah(  \
+   struct rte_pipeline *p, \
+   struct rte_mbuf **pkts, \
+   uint32_t n_pkts,\
+   void *arg)  \
+{  \
+   uint32_t i; \
+   \
+   uint64_t pkt_mask = RTE_LEN2MASK(n_pkts, uint64_t); \
+   \
+   rte_pipeline_ah_packet_hijack(p, pkt_mask); \
+   for (i = 0; i < (n_pkts & (~0x3LLU)); i += 4)   \
+   f_pkt4_work(&pkts[i], arg); \
+   \
+   for ( ; i < n_pkts; i++)\
+   f_pkt_work(pkts[i], arg);   \
+   \
+   return 0;   \
+}
+
 #define PIPELINE_TABLE_AH_HIT(f_ah, f_pkt_work, f_pkt4_work)   \
 static int \
 f_ah(  \
diff --git a/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c 
b/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c
index 7642462..75b6fd8 100644
--- a/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c
+++ b/examples/ip_pipeline/pipeline/pipeline_passthrough_be.c
@@ -72,7 +72,9 @@ pkt_work(
struct rte_mbuf *pkt,
void *arg,
uint32_t dma_size,
-   uint32_t hash_enabled)
+   uint32_t hash_enabled,
+   uint32_t lb_hash,
+   uint32_t port_out_pw2)
 {
struct pipeline_passthrough *p = arg;

@@ -90,8 +92,24 @@ pkt_work(
dma_dst[i] = dma_src[i] & dma_mask[i];

/* Read (dma_dst), compute (hash), write (hash) */
-   if (hash_enabled)
-   *dma_hash = p->f_hash(dma_dst, dma_size, 0);
+   if (hash_enabled) {
+   uint32_t hash = p->f_hash(dma_dst, dma_size, 0);
+   *dma_hash = hash;
+
+   if (lb_hash) {
+   uint32_t port_out;
+
+   if (port_out_pw2)
+   port_out
+   = hash & (p->p.n_ports_out - 1);
+   else
+   port_out
+   = hash % p->p.n_ports_out;
+
+   rte_pipeline_port_out_packet_insert(p->p.p,
+   port_out, pkt);
+   }
+   }
 }

 static inline __attribute__((always_inline)) void
@@ -99,7 +117,9 @@ pkt4_work(
struct rte_mbuf **pkts,
void *arg,
uint32_t dma_size,
-   uint32_t hash_enabled)
+   uint32_t hash_enabled,
+   uint32_t lb_hash,
+   ui

[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-27 Thread Thomas Monjalon
2016-01-27 10:08, Burakov, Anatoly:
> > Why a new file for these functions?
> 
> Well, my thought was to make future extensions easier by way of avoiding 
> mixing irrelevant and/or general code with driver-specific code. I can change 
> it back if that's not OK.

No strong opinion here.
David?


[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-27 Thread David Marchand
On Wed, Jan 27, 2016 at 11:12 AM, Thomas Monjalon
 wrote:
> 2016-01-27 10:08, Burakov, Anatoly:
>> > Why a new file for these functions?
>>
>> Well, my thought was to make future extensions easier by way of avoiding 
>> mixing irrelevant and/or general code with driver-specific code. I can 
>> change it back if that's not OK.
>
> No strong opinion here.
> David?

Hum, no strong opinion either, but I don't think we really need to
split this file for this much code.
Besides, if we keep all code in eal_pci_vfio.c, there is no need to
expose those structures through eal_pci_init.h.


-- 
David Marchand


[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-27 Thread Burakov, Anatoly
> >> > Why a new file for these functions?
> >>
> >> Well, my thought was to make future extensions easier by way of
> avoiding mixing irrelevant and/or general code with driver-specific code. I 
> can
> change it back if that's not OK.
> >
> > No strong opinion here.
> > David?
> 
> Hum, no strong opinion either, but I don't think we really need to split this
> file for this much code.
> Besides, if we keep all code in eal_pci_vfio.c, there is no need to expose
> those structures through eal_pci_init.h.

OK then, I'll merge it back into the eal_pci_vfio.c

Thanks,
Anatoly


[dpdk-dev] [PATCH v2] eal: add function to check if primary proc alive

2016-01-27 Thread Harry van Haaren
This patch adds a new function to the EAL API:
int rte_eal_primary_proc_alive(const char *path);

The function indicates if a primary process is alive right now.
This functionality is implemented by testing for a write-
lock on the config file, and the function tests for a lock.

The use case for this functionality is that a secondary
process can wait until a primary process starts by polling
the function and waiting. When the primary is running, the
secondary continues to poll to detect if the primary process
has quit unexpectedly, the secondary process can detect this.

The RTE_MAGIC number is written to the shared config by the
primary process, this is the signal to the secondary process
that the EAL is set up, and ready to be used. The function
rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
delayed in the EAL init proceedure, as the PCI probing in
the primary process can interfere with the secondary running.

Signed-off-by: Harry van Haaren 
---

v2:
- Passing NULL as const char* uses default /var/run/.rte_config
- Moved code into /common/ instead of /linuxapp/, should work on BSD now

 doc/guides/rel_notes/release_2_3.rst|  7 +++
 lib/librte_eal/bsdapp/eal/Makefile  |  1 +
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  8 
 lib/librte_eal/common/eal_common_proc.c | 61 +
 lib/librte_eal/common/include/rte_eal.h | 18 
 lib/librte_eal/linuxapp/eal/Makefile|  1 +
 lib/librte_eal/linuxapp/eal/eal.c   |  4 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++
 8 files changed, 105 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_proc.c

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..14b5b06 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -11,6 +11,13 @@ Resolved Issues
 EAL
 ~~~

+* **Added rte_eal_primary_proc_alive() function**
+
+  A new function ``rte_eal_primary_proc_alive()`` has been added
+  to allow the user to detect if a primary process is running.
+  Use cases for this feature include fault detection, and monitoring
+  using secondary processes.
+

 Drivers
 ~~~
diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 65b293f..2d6e3b1 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -61,6 +61,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_alarm.c

 # from common dir
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_lcore.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_proc.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_timer.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_memzone.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_log.c
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..0e28017 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,11 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_primary_proc_alive;
+
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
new file mode 100644
index 000..c598891
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -0,0 +1,61 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Intel Shannon Ltd. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIA

[dpdk-dev] [PATCH] eal: add function to check if primary proc alive

2016-01-27 Thread Van Haaren, Harry
> From: Richardson, Bruce
> > Agreed, however hiding it totally removes the flexibility of waiting for a 
> > primary
> > that is starting with --file-prefix (aka: in a non-default location). 
> > Imposing
> > a limit on only monitoring primary procs in the default location seems 
> > wrong.
> 
> But the secondary also needs the same prefix. Is that prefix not accessible by
> this function to be used?

The issue is that the EAL parsing code is performed during rte_init(), which
is exactly what this function tries to avoid - initializing EAL before a primary
process starts.

I looked at changing the EAL parsing to come before rte_init(), and considered
adding a minimal parser for --file-prefix. Both routes seem a bad solution,
either for complexity or code-duplication.

v2 of this patch posted to list:
http://dpdk.org/dev/patchwork/patch/10126/

-Harry


[dpdk-dev] [PATCH v5 10/11] virtio: pci: add dummy func definition for in/outb for non-x86 arch

2016-01-27 Thread Santosh Shukla
Ping?

On Tue, Jan 19, 2016 at 5:16 PM, Santosh Shukla  wrote:
> For non-x86 arch, Compiler will throw build error for in/out apis. Including
> dummy api function so to pass build.
>
> Note that: For virtio to work for non-x86 arch - RTE_EAL_VFIO is the only
> supported method. RTE_EAL_IGB_UIO is not supported for non-x86 arch.
>
> So, Virtio support for arch and supported interface by that arch:
>
> ARCH   IGB_UIO  VFIO
> x86 Y   Y
> ARM64   N/A Y
> PPC_64  N/A Y   (Not tested but likely should work, as vfio is
> arch independent)
>
> Note: Applicable for virtio spec 0.95
>
> Signed-off-by: Santosh Shukla 
> ---
>  drivers/net/virtio/virtio_pci.h |   46 
> +++
>  1 file changed, 46 insertions(+)
>
> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> index f550d22..b88f9ec 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -46,6 +46,7 @@
>  #endif
>
>  #include 
> +#include "virtio_logs.h"
>
>  struct virtqueue;
>
> @@ -320,6 +321,51 @@ outl_p(unsigned int data, unsigned int port)
>  }
>  #endif
>
> +#if !defined(RTE_ARCH_X86_64) && !defined(RTE_ARCH_I686) && \
> +   defined(RTE_EXEC_ENV_LINUXAPP)
> +static inline uint8_t
> +inb(unsigned long addr __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "inb() not supported for this RTE_ARCH\n");
> +   return 0;
> +}
> +
> +static inline uint16_t
> +inw(unsigned long addr __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "inw() not supported for this RTE_ARCH\n");
> +   return 0;
> +}
> +
> +static inline uint32_t
> +inl(unsigned long addr __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "in() not supported for this RTE_ARCH\n");
> +   return 0;
> +}
> +
> +static inline void
> +outb_p(unsigned char data __rte_unused, unsigned int port __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "outb_p() not supported for this RTE_ARCH\n");
> +   return;
> +}
> +
> +static inline void
> +outw_p(unsigned short data __rte_unused, unsigned int port __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "outw_p() not supported for this RTE_ARCH\n");
> +   return;
> +}
> +
> +static inline void
> +outl_p(unsigned int data __rte_unused, unsigned int port __rte_unused)
> +{
> +   PMD_INIT_LOG(ERR, "outl_p() not supported for this RTE_ARCH\n");
> +   return;
> +}
> +#endif
> +
>  static inline int
>  vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
>  {
> --
> 1.7.9.5
>


[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-27 Thread Santosh Shukla
On Tue, Jan 26, 2016 at 9:51 PM, Santosh Shukla  wrote:
> On Tue, Jan 26, 2016 at 7:58 PM, Thomas Monjalon
>  wrote:
>> 2016-01-26 19:35, Santosh Shukla:
>>> On Tue, Jan 26, 2016 at 6:30 PM, Thomas Monjalon
>>>  wrote:
>>> > 2016-01-26 15:56, Santosh Shukla:
>>> >> In my observation, currently virtio work for vfio-noiommu, that's why
>>> >> said drv->kdrv need to know vfio mode.
>>> >
>>> > It is your observation. It may change in near future.
>>>
>>> so that mean till then, virtio support for non-x86 arch has to wait?
>>
>> No, absolutely not. virtio for non-x86 is welcome.
>>
>>> We have working model with vfio-noiommu, don't you think it make sense
>>> to let vfio_noiommu implementation exist and later in-case
>>> virtio+iommu gets mainline then switch to vfio __mode__ agnostic
>>> approach. And for that All it takes to replace __noiommu suffix with
>>> default.
>>
>> I'm just saying you should not touch the enum rte_kernel_driver.
>> RTE_KDRV_VFIO is a driver.
>> RTE_KDRV_VFIO_NOIOMMU is a mode.
>> As the VFIO API is the same in both modes, there is no reason to
>> distinguish them at this level.
>> Your patch adds the NOIOMMU case everywhere:
>> case RTE_KDRV_VFIO:
>> +   case RTE_KDRV_VFIO_NOIOMMU:
>>
>> I'll stop commenting here to let others give their opinion.
>>
>> [...]
>>> >> with vfio+iommu; binding virtio pci device to vfio-pci driver fail;
>>> >> giving below error:
>>> >> [   53.053464] VFIO - User Level meta-driver version: 0.3
>>> >> [   73.077805] vfio-pci: probe of :00:03.0 failed with error -22
>>> >> [   73.077852] vfio-pci: probe of :00:03.0 failed with error -22
>>> >>
>>> >> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get()
>>> >> fails: iommu doesn't have group for virtio pci device.
>>> >
>>> > Yes it fails when binding.
>>> > So the later check in the virtio PMD is useless.
>>>
>>> Which check?
>>
>> The check for VFIO noiommu only:
>> -   if (dev->kdrv == RTE_KDRV_VFIO)
>> +   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
>>
>> [...]
>>> > Furthermore restricting virtio to no-iommu mode doesn't bring
>>> > any improvement.
>>>
>>> We're not __restricting__, as soon as virtio+iommu gets working state,
>>> we'll simply replace __noiommu with default. Then its upto user to try
>>> out virtio with vfio default or vfio_noiommu.
>>
>> Yes it's up to user.
>> So your code should be
>> if (dev->kdrv == RTE_KDRV_VFIO)
>>
>
> Right,
>
>>> > That's why I suggest to keep the initial semantic of kdrv and
>>> > not pollute it with VFIO modes.
>>>
>>> I am okay to live with default and forget suffix __noiommu but there
>>> are implementation problem which was discussed in other thread
>>> - Virtio pmd driver should avoid interface parsing i.e.
>>> virtio_resource_init_uio/vfio() etc.. For vfio case - We could easily
>>> get rid of by moving /sys parsing to pci_eal layer, Right? If so then
>>> virtio currently works with vfio-noiommu, it make sense to me that
>>> pci_eal layer does parsing for pmd driver before that pmd driver get
>>> initialized.
>>
>> Please reword. What is the problem?
>>
>>> - Another case could be: iommu-less-pmd-driver. eal layer to do
>>> parsing before updating drv->kdrv.
>>
>> [...]
>>> >> >> > If a check is needed, I would prefer using your function
>>> >> >> > pci_vfio_is_noiommu() and remove driver modes from struct 
>>> >> >> > rte_kernel_driver.
>>> >> >>
>>> >> >> I don't think calling pci_vfio_no_iommu() inside
>>> >> >> virtio_reg_rd/wr_1/2/3() would be a good idea.
>>> >> >
>>> >> > Why? The value may be cached in the priv properties.
>>> >> >
>>> >> pci_vfio_is_noiommu() parses /sys for
>>> >> - enable_noiommu param
>>> >> - attached driver name is vfio-noiommu or not.
>>> >>
>>> >> It does file operation for that, I meant to say that calling this api
>>> >> within register_rd/wr function is not correct. It would be better if
>>> >> those low level register_rd/wr api only checks driver_types.
>>> >
>>> > Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached
>>> > to keep efficiency.
>>>
>>> I am not convinced though, Still find pmd driver checking driver_types
>>> using drv->kdrv is better approach than introducing a new global
>>> variable which may look something like;
>>
>> Not a global variable. A function in EAL layer. A variable in PMD priv.
>>
>
> If we agreed to use condition (drv->kdrv == RTE_KDRV_VFIO);
> then resource parsing for vfio {including vfio and vfio_noiommu both
> case} is enforced in virtio pmd driver layer and that is contradicting
> to what we agreed earlier in this[1] thread. Also we don't need a
> function in EAL layer or a variable in PMD priv. Perhaps a private
> function in virtio pmd which does parsing for vfio interface.
>
> Thoughts?
>
> [1] http://dpdk.org/dev/patchwork/patch/9862/
>

Any comment/feedback on above approach?

>>> At pci_eal layer 
>>> bool vfio_mode;
>>> vfio_mode = pci_vfio_is_noiommu();
>>>
>>> At virtio pmd driver layer 
>>> Checking v

[dpdk-dev] [PATCH v2] ip_pipeline: fix cpu socket-id error

2016-01-27 Thread Jasvinder Singh
This patch fixes the socket-id error in ip_pipeline sample
application running over uni-processor systems.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
v2:
- used SOCKET_ID_ANY instead of -1

 examples/ip_pipeline/init.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 186ca03..c4601c9 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -835,6 +835,14 @@ app_init_link_frag_ras(struct app_params *app)
}
 }

+static inline int
+app_get_cpu_socket_id(uint32_t pmd_id)
+{
+   int status = rte_eth_dev_socket_id(pmd_id);
+
+   return (status != SOCKET_ID_ANY) ? status : 0;
+}
+
 static void
 app_init_link(struct app_params *app)
 {
@@ -890,7 +898,7 @@ app_init_link(struct app_params *app)
p_link->pmd_id,
rxq_queue_id,
p_rxq->size,
-   rte_eth_dev_socket_id(p_link->pmd_id),
+   app_get_cpu_socket_id(p_link->pmd_id),
&p_rxq->conf,
app->mempool[p_rxq->mempool_id]);
if (status < 0)
@@ -917,7 +925,7 @@ app_init_link(struct app_params *app)
p_link->pmd_id,
txq_queue_id,
p_txq->size,
-   rte_eth_dev_socket_id(p_link->pmd_id),
+   app_get_cpu_socket_id(p_link->pmd_id),
&p_txq->conf);
if (status < 0)
rte_panic("%s (%" PRIu32 "): "
@@ -989,7 +997,7 @@ app_init_tm(struct app_params *app)
/* TM */
p_tm->sched_port_params.name = p_tm->name;
p_tm->sched_port_params.socket =
-   rte_eth_dev_socket_id(p_link->pmd_id);
+   app_get_cpu_socket_id(p_link->pmd_id);
p_tm->sched_port_params.rate =
(uint64_t) link_eth_params.link_speed * 1000 * 1000 / 8;

-- 
2.5.0



[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue

2016-01-27 Thread Czesnowicz, Przemyslaw
Hi Abhijeet,


It seems you are almost there!
When booting the VM?s do you request hugepage memory for them (by setting 
hw:mem_page_size=large in flavor extra_spec)?
If not then please do, if yes then please look into libvirt logfiles for the 
VM?s (in /var/log/libvirt/qemu/instance-xxx), I think there could be a clue.


Regards
Przemek

From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com]
Sent: Monday, January 25, 2016 6:13 PM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM 
communication & IP allocation through DHCP issue

Hi Przemek,

Thank you for your response, It really provided us breakthrough.

After setting up DPDK on compute node for stable/kilo, We are trying to set up 
Openstack stable/liberty all-in-one setup, At present we are not able to get 
the IP allocation for the vhost type instances through DHCP. Also we tried 
assigning IP's manually to them but the inter-VM communication also not 
happening,

#neutron agent-list
root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list
+--++---+---++---+
| id   | agent_type | host  
| alive | admin_state_up | binary|
+--++---+---++---+
| 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent | nfv-dpdk-devstack 
| :-)   | True   | neutron-openvswitch-agent |
| 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent   | nfv-dpdk-devstack 
| :-)   | True   | neutron-l3-agent  |
| 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent | nfv-dpdk-devstack 
| :-)   | True   | neutron-dhcp-agent|
| b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | nfv-dpdk-devstack 
| :-)   | True   | neutron-linuxbridge-agent |
| c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent | nfv-dpdk-devstack 
| :-)   | True   | neutron-metadata-agent|
| f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | nfv-dpdk-devstack 
| xxx   | True   | neutron-openvswitch-agent |
+--++---+---++---+


ovs-vsctl show output#

Bridge br-dpdk
Port br-dpdk
Interface br-dpdk
type: internal
Port phy-br-dpdk
Interface phy-br-dpdk
type: patch
options: {peer=int-br-dpdk}
Bridge br-int
fail_mode: secure
Port "vhufa41e799-f2"
tag: 5
Interface "vhufa41e799-f2"
type: dpdkvhostuser
Port int-br-dpdk
Interface int-br-dpdk
type: patch
options: {peer=phy-br-dpdk}
Port "tap4e19f8e1-59"
tag: 5
Interface "tap4e19f8e1-59"
type: internal
Port "vhu05734c49-3b"
tag: 5
Interface "vhu05734c49-3b"
type: dpdkvhostuser
Port "vhu10c06b4d-84"
tag: 5
Interface "vhu10c06b4d-84"
type: dpdkvhostuser
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "vhue169c581-ef"
tag: 5
Interface "vhue169c581-ef"
type: dpdkvhostuser
Port br-int
Interface br-int
type: internal
Bridge br-tun
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
error: "could not open network device br-tun (Invalid argument)"
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
ovs_version: "2.4.0"



ovs-ofctl dump-flows br-int#

root at nfv-dpdk-devstack:/etc/neutron# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0xaaa002bb2bcf827b, duration=2410.012s, table=0, n_packets=0, 
n_bytes=0, idle_age=2410, priority=10,icmp6,in_port=43,icmp_type=136 
actions=resubmit(,24)
 cookie=0xaaa002bb2bcf827b, duration=2409.480s, table=0, n_packets=0, 
n_bytes=0, idle_age=2409, priority=10,icmp6,in_port=44,icmp_type=136 
actions=resubmit(,24)
 cookie=0xaaa002bb2bcf827b, duration=2408.704s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=45,icmp_type=136 
actions=resubmit(,24)
 cookie=0xaaa002bb2bcf827b, duration=2408.155s, table=0, n_packets=0, 
n_bytes=0, idle_age=2408, priority=10,icmp6,in_port=42,

[dpdk-dev] [RFC] eal: add cgroup-aware resource self discovery

2016-01-27 Thread Tan, Jianfeng
Hi Neil,

On 1/26/2016 10:19 PM, Neil Horman wrote:
> On Tue, Jan 26, 2016 at 10:22:18AM +0800, Tan, Jianfeng wrote:
>> Hi Neil,
>>
>> On 1/25/2016 9:46 PM, Neil Horman wrote:
>>> On Mon, Jan 25, 2016 at 02:49:53AM +0800, Jianfeng Tan wrote:
>> ...
 -- 
 2.1.4


>>> This doesn't make a whole lot of sense, for several reasons:
>>>
>>> 1) Applications, as a general rule shouldn't be interrogating the cgroups
>>> interface at all.
>> The main reason to do this in DPDK is that DPDK obtains resource information
>> from sysfs and proc, which are not well containerized so far. And DPDK
>> pre-allocates resource instead of on-demand gradual allocating.
>>
> Not disagreeing with this, just suggesting that:
>
> 1) Interrogating cgroups really isn't the best way to collect that information
> 2) Pre-allocating those resources isn't particularly wise without some 
> mechanism
> to reallocate it, as resource constraints can change (consider your cpuset
> getting rewritten)

In the case of reallocate,
For cpuset, DPDK panics in the initialization if set_affinity fails, but 
after that, cpuset rewritten will not bring any problem I believe.
For memory, a running application uses 2G hugepages, then admin 
decreases hugetlb cgroup into 1G, the application will not get killed, 
unless it tries to access more hugepages (I'll double check this).

So another way to address this problem is to add an option that DPDK 
tries best to allocate those resources, and if fails, it just posts a 
warning and uses those allocated resources, instead of panic. What do 
you think?

>
>>> 2) Cgroups aren't the only way in which a cpuset or memoryset can be 
>>> restricted
>>> (the isolcpus command line argument, or a taskset on a parent process for
>>> instance, but there are several others).
>> Yes, I agree. To enable that, I'd like design the new API for resource self
>> discovery in a flexible way. A parameter "type" is used to specify the
>> solution to discovery way. In addition, I'm considering to add a callback
>> function pointer so that users can write their own resource discovery
>> functions.
>>
> Why?  You don't need an API for this, or if you really want one, it can be 
> very
> generic if you use POSIX apis to gather the information.  What you have here 
> is
> going to be very linux specific, and will need reimplementing for BSD or other
> operating systems.  To use the cpuset example, instead of reading and parsing
> the mask files in the cgroup filesystem module to find your task and
> corresponding mask, just call sched_setaffinity with an all f's mask, then 
> call
> sched_getaffinity.  The returned mask will be all the cpus your process is
> allowed to execute on, taking into account every limiting filter the system 
> you
> are running on offers.

Yes, it makes sense on cpu's side.

>
> There are simmilar OS level POSIX apis for most resources out there.  You 
> really
> don't need to dig through cgroups just to learn what some of those reources 
> are.
>
>>> Instead of trying to figure out what cpuset is valid for your process by
>>> interrogating the cgroups heirarchy, instead you should follow the 
>>> proscribed
>>> method of calling sched_getaffinity after calling sched_setaffinity.  That 
>>> will
>>> give you the canonical cpuset that you are executing on, taking all cpuset
>>> filters into account (including cgroups and any other restrictions).  Its 
>>> far
>>> simpler as well, as it doesn't require a ton of file/string processing.
>> Yes, this way is much better for cpuset discovery. But is there such a
>> syscall for hugepages?
>>
> In what capacity?  Interrogating how many hugepages you have, or to what node
> they are affined to?  Capacity would require reading the requisite proc file, 
> as
> theres no posix api for this resource.  Node affinity can be implied by 
> setting
> the numa policy of the dpdk and then writing to /proc/nr_hugepages, as the
> kernel will attempt to distribute hugepages evenly among the tasks' numa 
> policy
> configuration.

For memory affinity, I believe the existing way of reading 
/proc/self/pagemap already handle the problem. What I was asking is how 
much memory (or hugepages in Linux's case) can be used. By the way, what 
is /proc/nr_hugepages?

>
> That said, I would advise that you strongly consider not exporting hugepages 
> as
> a resource, as:
>
> a) Applications generally don't need to know that they are using hugepages, 
> and
> so they dont need to know where said hugepages live, they just allocate memory
> via your allocation api and you give them something appropriate

But the allocation api provider, DPDK library, needs to know if it's 
using hugepages or not.

> b) Hugepages are a resource that are very specific to Linux, and to X86 Linux 
> at
> that.  Some OS implement simmilar resources, but they may have very different
> semantics.  And other Arches may or may not implement various forms of 
> compound
> paging at all.  As the DPDK expands to sup

[dpdk-dev] [PATCH v2 4/4] virtio: check if any kernel driver is manipulating the virtio device

2016-01-27 Thread Thomas Monjalon
2016-01-07 16:17, Panu Matilainen:
> On 01/03/2016 07:56 PM, Huawei Xie wrote:
> > v2 changes:
> >   change LOG level from ERR to INFO
> >
> > virtio PMD could use IO port to configure the virtio device without
> > using uio driver.
> >
> > There are two issues with previous implementation:
> > 1) virtio PMD will take over each virtio device blindly even if some
> > are not intended for DPDK.
> > 2) driver conflict between virtio PMD and virtio-net kernel driver.
> >
> > This patch checks if there is any kernel driver manipulating the virtio
> > device before virtio PMD uses IO port to configure the device.
> >
> > Fixes: da978dfdc43b ("virtio: use port IO to get PCI resource")
> >
> > Signed-off-by: Huawei Xie 
> > ---
> >   drivers/net/virtio/virtio_ethdev.c | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.c 
> > b/drivers/net/virtio/virtio_ethdev.c
> > index e815acd..7a50dac 100644
> > --- a/drivers/net/virtio/virtio_ethdev.c
> > +++ b/drivers/net/virtio/virtio_ethdev.c
> > @@ -1138,6 +1138,13 @@ static int virtio_resource_init_by_ioports(struct 
> > rte_pci_device *pci_dev)
> > int found = 0;
> > size_t linesz;
> >
> > +   if (pci_dev->kdrv != RTE_KDRV_NONE) {
> > +   PMD_INIT_LOG(INFO,
> > +   "kernel driver is manipulating this device." \
> > +   " Please unbind the kernel driver.");
> 
> At the very least this message needs to be changed.
> 
> Like said earlier, I think the message could just as well be dropped 
> entirely, but at least it should be something to the tune of "ignoring 
> kernel owned device" instead of asking the user to break their 
> configuration.

Huawei, a v3 is required. Thanks


[dpdk-dev] [PATCH 0/9] pci cleanup and blacklist rework

2016-01-27 Thread David Marchand
On Fri, Jan 22, 2016 at 4:27 PM, David Marchand
 wrote:
> The 4th patch introduces a change in linux eal.
> Before, if a pci device was bound to no kernel driver, eal would set kdrv
> to "unknown". With this change, kdrv is set to "none".
> This might make it possible to avoid the old issue of virtio devices being
> used by dpdk while still bound to kernel driver reported by Franck B..
> I'll let virtio guys look at this.
> At the very least, it makes more sense to me.

Ok, actually, I had forgotten that Huawei had already sent a similar change [1].
So I suppose this patch commitlog is wrong, but the patch itself is
still worth for the cleanup.

Thomas, I suppose you will integrate Huawei patches first.
Then I will rebase and fix the commitlog.

[1] http://dpdk.org/dev/patchwork/patch/9718/


-- 
David Marchand


[dpdk-dev] [PATCH] vfio/noiommu: Don't use iommu_present() to track fake groups

2016-01-27 Thread Burakov, Anatoly
Hi Alex,

> On 01/23/2016 04:23 AM, Alex Williamson wrote:
> > Using iommu_present() to determine whether an IOMMU group is real or
> > fake has some problems.  First, apparently Power systems don't
> > register an IOMMU on the device bus, so the groups and containers get
> > marked as noiommu and then won't bind to their actual IOMMU driver.
> > Second, I expect we'll run into the same issue as we try to support
> > vGPUs through vfio, since they're likely to emulate this behavior of
> > creating an IOMMU group on a virtual device and then providing a vfio
> > IOMMU backend tailored to the sort of isolation they provide, which
> > won't necessarily be fully compatible with the IOMMU API.
> >
> > The solution here is to use the existing iommudata interface to IOMMU
> > groups, which allows us to easily identify the fake groups we've
> > created for noiommu purposes.  The iommudata we set is purely
> > arbitrary since we're only comparing the address, so we use the
> > address of the noiommu switch itself.
> >
> > Reported-by: Alexey Kardashevskiy 
> > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
> > Signed-off-by: Alex Williamson 
> 
> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> Tested-by: Alexey Kardashevskiy 

Tested bringing the NIC's up, encountered no issues. Curious if it also works 
for Santosh (CC'd) as he's one of the intended users of the No-IOMMU 
functionality, but otherwise seems to work.

Thanks,
Anatoly


[dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job

2016-01-27 Thread Panu Matilainen
On 01/26/2016 06:15 PM, Marcin Kerlin wrote:
> This patch adds new function rte_jobstats_abort. It marks *job* as finished
> and time of this work will be add to management time instead of execution 
> time.
> This function should be used instead of rte_jobstats_finish if condition 
> occure,
> condition is defined by the application for example when receiving n>0 
> packets.
>
> Signed-off-by: Marcin Kerlin 
> ---
>   lib/librte_jobstats/rte_jobstats.c   | 22 ++
>   lib/librte_jobstats/rte_jobstats.h   | 17 +
>   lib/librte_jobstats/rte_jobstats_version.map |  7 +++
>   3 files changed, 46 insertions(+)
>
[...]
> diff --git a/lib/librte_jobstats/rte_jobstats.h 
> b/lib/librte_jobstats/rte_jobstats.h
> index de6a89a..9995319 100644
> --- a/lib/librte_jobstats/rte_jobstats.h
> +++ b/lib/librte_jobstats/rte_jobstats.h
> @@ -90,6 +90,9 @@ struct rte_jobstats {
>   uint64_t exec_cnt;
>   /**< Execute count. */
>
> + uint64_t last_job_time;
> + /**< Last job time */
> +
>   char name[RTE_JOBSTATS_NAMESIZE];
>   /**< Name of this job */
>

AFAICS this is an ABI break and as such, needs to be preannounced, see 
http://dpdk.org/doc/guides/contributing/versioning.html
For 2.3 it'd need to be a CONFIG_RTE_NEXT_ABI feature.

- Panu -



[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment

2016-01-27 Thread Thomas Monjalon
2016-01-20 21:15, Matthew Hall:
> On 1/20/16 7:27 AM, Thomas Monjalon wrote:
> > Hi Matthew,
> >
> > RTE_SDK_BIN is an internal variable and should not be overriden.
>  >
> > Have you installed DPDK somewhere? Example:
> > make install O=mybuild DESTDIR=mylocalinstall
> >
> > Then you should build your app like this:
> > make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk)
> 
> Hello Thomas,
> 
> Is the way the make install target really works documented somewhere?

It is poorly described here:
http://dpdk.org/doc/guides/prog_guide/dev_kit_root_make_help.html#install-targets

> This target did not exist when I first used DPDK in 2011, and since then 
> I saw various documentation on building DPDK in various places, but not 
> that much explanation what make install actually does. I recall various 
> list threads about changing its behavior as well.

Historically, "make install" was a convenient default build (with T= option).
The DESTDIR option was added to make a real install after building.
The standard form (without T=) is now implemented to do a real install.

> For example, if I look at this apparently most official document:
> 
> http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html
> 
> It has build examples such as:
> 
> make install T=x86_64-native-linuxapp-gcc

This command finishes with this message:
Installation cannot run with T defined and DESTDIR undefined

Yes you are right, some docs are neither complete nor up-to-date.
Volunteers are welcome.

> But it does not discuss "O=" or "DESTDIR=" or any other additional 
> options. From some experiments on my machine, it looks like maybe I 
> could do this:
> 
> make install "T=${RTE_TARGET}" "O=build" "DESTDIR=build"
> 
> Is that a valid possibility, to keep it all in one easy directory?

Yes you can install where you want.
Note that this command (with T= and O=) will build in the directory $O/$T
i.e. build/${RTE_TARGET} and install in build/

Please confirm that this patch is not needed. Thanks


[dpdk-dev] [PATCH] vfio/noiommu: Don't use iommu_present() to track fake groups

2016-01-27 Thread Santosh Shukla
On Wed, Jan 27, 2016 at 6:51 PM, Burakov, Anatoly
 wrote:
> Hi Alex,
>
>> On 01/23/2016 04:23 AM, Alex Williamson wrote:
>> > Using iommu_present() to determine whether an IOMMU group is real or
>> > fake has some problems.  First, apparently Power systems don't
>> > register an IOMMU on the device bus, so the groups and containers get
>> > marked as noiommu and then won't bind to their actual IOMMU driver.
>> > Second, I expect we'll run into the same issue as we try to support
>> > vGPUs through vfio, since they're likely to emulate this behavior of
>> > creating an IOMMU group on a virtual device and then providing a vfio
>> > IOMMU backend tailored to the sort of isolation they provide, which
>> > won't necessarily be fully compatible with the IOMMU API.
>> >
>> > The solution here is to use the existing iommudata interface to IOMMU
>> > groups, which allows us to easily identify the fake groups we've
>> > created for noiommu purposes.  The iommudata we set is purely
>> > arbitrary since we're only comparing the address, so we use the
>> > address of the noiommu switch itself.
>> >
>> > Reported-by: Alexey Kardashevskiy 
>> > Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
>> > Signed-off-by: Alex Williamson 
>>
>>
>>
>> Reviewed-by: Alexey Kardashevskiy 
>> Tested-by: Alexey Kardashevskiy 
>
> Tested bringing the NIC's up, encountered no issues. Curious if it also works 
> for Santosh (CC'd) as he's one of the intended users of the No-IOMMU 
> functionality, but otherwise seems to work.
>

Yes, Its works for virtio dpdk case too, Tested-by:

Thanks.
> Thanks,
> Anatoly


[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2016-01-27 Thread Panu Matilainen
On 01/26/2016 07:03 PM, Huawei Xie wrote:
> v6 changes:
>   reflect the changes in release notes and library version map file
>   revise our duff's code style a bit to make it more readable
>
> v5 changes:
>   add comment about duff's device and our variant implementation
>
> v3 changes:
>   move while after case 0
>   add context about duff's device and why we use while loop in the commit
> message
>
> v2 changes:
>   unroll the loop a bit to help the performance
>
> rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
>
> There is related thread about this bulk API.
> http://dpdk.org/dev/patchwork/patch/4718/
> Thanks to Konstantin's loop unrolling.
>
> Attached the wiki page about duff's device. It explains the performance
> optimization through loop unwinding, and also the most dramatic use of
> case label fall-through.
> https://en.wikipedia.org/wiki/Duff%27s_device
>
> In our implementation, we use while() loop rather than do{} while() loop
> because we could not assume count is strictly positive. Using while()
> loop saves one line of check if count is zero.
>
> Signed-off-by: Gerald Rogers 
> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 
> ---
>   doc/guides/rel_notes/release_2_3.rst |  3 ++
>   lib/librte_mbuf/rte_mbuf.h   | 55 
> 
>   lib/librte_mbuf/rte_mbuf_version.map |  7 +
>   3 files changed, 65 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_2_3.rst 
> b/doc/guides/rel_notes/release_2_3.rst
> index 99de186..a52cba3 100644
> --- a/doc/guides/rel_notes/release_2_3.rst
> +++ b/doc/guides/rel_notes/release_2_3.rst
> @@ -4,6 +4,9 @@ DPDK Release 2.3
>   New Features
>   
>
> +* **Enable bulk allocation of mbufs. **
> +  A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the 
> user
> +  to allocate a bulk of mbufs.
>
>   Resolved Issues
>   ---
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f234ac9..b2ed479 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1336,6 +1336,61 @@ static inline struct rte_mbuf 
> *rte_pktmbuf_alloc(struct rte_mempool *mp)
>   }
>
>   /**
> + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to 
> default
> + * values.
> + *
> + *  @param pool
> + *The mempool from which mbufs are allocated.
> + *  @param mbufs
> + *Array of pointers to mbufs
> + *  @param count
> + *Array size
> + *  @return
> + *   - 0: Success
> + */
> +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
> +  struct rte_mbuf **mbufs, unsigned count)
> +{
> + unsigned idx = 0;
> + int rc;
> +
> + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> + if (unlikely(rc))
> + return rc;
> +
> + /* To understand duff's device on loop unwinding optimization, see
> +  * https://en.wikipedia.org/wiki/Duff's_device.
> +  * Here while() loop is used rather than do() while{} to avoid extra
> +  * check if count is zero.
> +  */
> + switch (count % 4) {
> + case 0:
> + while (idx != count) {
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 3:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 2:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 1:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + }
> + }
> + return 0;
> +}
> +
> +/**
>* Attach packet mbuf to another packet mbuf.
>*
>* After attachment we refer the mbuf we attached as 'indirect',
> diff --git a/lib/librte_mbuf/rte_mbuf_version.map 
> b/lib/librte_mbuf/rte_mbuf_version.map
> index e10f6bd..257c65a 100644
> --- a/lib/librte_mbuf/rte_mbuf_version.map
> +++ b/lib/librte_mbuf/rte_mbuf_version.map
> @@ -18,3 +18,10 @@ DPDK_2.1 {
>   rte_pktmbuf_pool_create;
>
>   } DPDK_2.0;
> +
> +DPDK_2.3 {
> + global:
> +
> + rte_pktmbuf_alloc_bulk;
> +
> +} DPDK_2.1;
>

Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of 
the library ABI and should not be listed in the version map.

I assume its inline for performance reasons, but then you lose the 
benefits of dynamic linking such as ability to fix bugs and/or improve 
itby just updating the library. Since the point of 

[dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode

2016-01-27 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Tested-by: Santosh Shukla 
---
v3 changes:
  Merging DMA mapping functions back into eal_pci_vfio.c
  Fixing and adding comments

v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 2 files changed, 157 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..fdf334b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+/* DMA mapping function prototype.
+ * Takes VFIO container fd as a parameter.
+ * Returns 0 on success, -1 on error.
+ * */
+typedef  int (*vfio_dma_func_t)(int);
+
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+int vfio_iommu_type1_dma_map(int);
+int vfio_iommu_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
+};
+
+int
+vfio_iommu_type1_dma_map(int vfio_container_fd)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   int i, ret;
+
+   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   struct vfio_iommu_type1_dma_map dma_map;
+
+   if (ms[i].addr == NULL)
+   break;
+
+   memset(&dma_map, 0, sizeof(dma_map));
+   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+   dma_map.vaddr = ms[i].addr_64;
+   dma_map.size = ms[i].len;
+   dma_map.iova = ms[i].phys_addr;
+   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
+
+   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+int
+vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd)
+{
+   /* No-IOMMU mode does not need DMA mapping */
+   return 0;
+}
+
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = &iommu_types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   

[dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode

2016-01-27 Thread Burakov, Anatoly
Apologies, lost the signoff from Santosh Shukla and also the commit message 
still mentions the file that is now non-existent, so I'll submit a v4.

Thanks,
Anatoly


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Anatoly Burakov
> Sent: Wednesday, January 27, 2016 2:05 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode
> 
> This commit is adding a generic mechanism to support multiple IOMMU
> types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO
> mode that doesn't use IOMMU at all), but it's easily extended by adding
> necessary definitions into eal_pci_init.h and a DMA mapping function to
> eal_pci_vfio_dma.c.
> 
> Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the
> module check to check for vfio-pci instead. It's not ideal and triggers VFIO
> checks more often (and thus produces more error output, which was the
> reason behind the module check in the first place), so we compensate for
> that by providing more verbose logging, indicating whether VFIO initialization
> has succeeded or failed.
> 
> Signed-off-by: Anatoly Burakov 
> Tested-by: Santosh Shukla 
> ---
> v3 changes:
>   Merging DMA mapping functions back into eal_pci_vfio.c
>   Fixing and adding comments
> 
> v2 changes:
>   Compile fix (hat-tip to Santosh Shukla)
>   Tested-by is provisional, since only superficial testing was done
> 
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +--
> --
>  lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
>  2 files changed, 157 insertions(+), 53 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 74f91ba..fdf334b 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
>  #define VFIO_DIR "/dev/vfio"
>  #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
>  #define VFIO_GROUP_FMT "/dev/vfio/%u"
> +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
> 
>  /* per-process VFIO config */
>  static struct vfio_config vfio_cfg;
> 
> +/* DMA mapping function prototype.
> + * Takes VFIO container fd as a parameter.
> + * Returns 0 on success, -1 on error.
> + * */
> +typedef  int (*vfio_dma_func_t)(int);
> +
> +struct vfio_iommu_type {
> + int type_id;
> + const char *name;
> + vfio_dma_func_t dma_map_func;
> +};
> +
> +int vfio_iommu_type1_dma_map(int);
> +int vfio_iommu_noiommu_dma_map(int);
> +
> +/* IOMMU types we support */
> +static const struct vfio_iommu_type iommu_types[] = {
> + /* x86 IOMMU, otherwise known as type 1 */
> + { VFIO_TYPE1_IOMMU, "Type 1",
> &vfio_iommu_type1_dma_map},
> + /* IOMMU-less mode */
> + { VFIO_NOIOMMU_IOMMU, "No-IOMMU",
> &vfio_iommu_noiommu_dma_map}, };
> +
> +int
> +vfio_iommu_type1_dma_map(int vfio_container_fd) {
> + const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> + int i, ret;
> +
> + /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> + for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> + struct vfio_iommu_type1_dma_map dma_map;
> +
> + if (ms[i].addr == NULL)
> + break;
> +
> + memset(&dma_map, 0, sizeof(dma_map));
> + dma_map.argsz = sizeof(struct
> vfio_iommu_type1_dma_map);
> + dma_map.vaddr = ms[i].addr_64;
> + dma_map.size = ms[i].len;
> + dma_map.iova = ms[i].phys_addr;
> + dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
> VFIO_DMA_MAP_FLAG_WRITE;
> +
> + ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA,
> &dma_map);
> +
> + if (ret) {
> + RTE_LOG(ERR, EAL, "  cannot set up DMA remapping,
> "
> + "error %i (%s)\n", errno,
> strerror(errno));
> + return -1;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int
> +vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd) {
> + /* No-IOMMU mode does not need DMA mapping */
> + return 0;
> +}
> +
>  int
>  pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
>   void *buf, size_t len, off_t offs) @@ -208,42 +271,58 @@
> pci_vfio_set_bus_master(int dev_fd)
>   return 0;
>  }
> 
> -/* set up DMA mappings */
> -static int
> -pci_vfio_setup_dma_maps(int vfio_container_fd) -{
> - const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> - int i, ret;
> -
> - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
> - VFIO_TYPE1_IOMMU);
> - if (ret) {
> - RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
> - "error %i (%s)\n", errno, strerror(errno));
> - return -1;
> +/* pick IOMMU type. returns a pointer t

[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode

2016-01-27 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
v4 changes:
  Fixed the commit message and added a missing sign-off

v3 changes:
  Merging DMA mapping functions back into eal_pci_vfio.c
  Fixing and adding comments

v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 2 files changed, 157 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..fdf334b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+/* DMA mapping function prototype.
+ * Takes VFIO container fd as a parameter.
+ * Returns 0 on success, -1 on error.
+ * */
+typedef  int (*vfio_dma_func_t)(int);
+
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+int vfio_iommu_type1_dma_map(int);
+int vfio_iommu_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
+};
+
+int
+vfio_iommu_type1_dma_map(int vfio_container_fd)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   int i, ret;
+
+   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   struct vfio_iommu_type1_dma_map dma_map;
+
+   if (ms[i].addr == NULL)
+   break;
+
+   memset(&dma_map, 0, sizeof(dma_map));
+   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+   dma_map.vaddr = ms[i].addr_64;
+   dma_map.size = ms[i].len;
+   dma_map.iova = ms[i].phys_addr;
+   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
+
+   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+int
+vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd)
+{
+   /* No-IOMMU mode does not need DMA mapping */
+   return 0;
+}
+
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = &iommu_types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU typ

[dpdk-dev] [PATCH] no need to test for NULL when freeing

2016-01-27 Thread Thomas Monjalon
2016-01-21 12:23, David Marchand:
> free() already handles NULL pointer.
> 
> Signed-off-by: David Marchand 

Applied, thanks


[dpdk-dev] [PATCH v2 0/2] minor cleanup in ethdev hotplug

2016-01-27 Thread Thomas Monjalon
2016-01-22 15:06, David Marchand:
> It was first a preparation step for future patchsets, but I am not sure what
> will become of them, so sending this anyway since it does not hurt to clean
> this now.
> 
> Changes since v1:
> - rebased on HEAD (previous patchset was based on another patch I sent
>   separately)
> - restored EINVAL error code for rte_eth_dev_(at|de)tach (thanks Jan)

Applied, thanks


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Thomas Monjalon
2016-01-17 22:05, Zhihong Wang:
> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.

On a related note, your expertise would be very valuable to review
these patches please:
(memcpy) http://dpdk.org/dev/patchwork/patch/4396/
(memcmp) http://dpdk.org/dev/patchwork/patch/4788/

Thanks


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Thomas Monjalon
> Zhihong Wang (5):
>   lib/librte_eal: Identify AVX512 CPU flag
>   mk: Predefine AVX512 macro for compiler
>   lib/librte_eal: Optimize memcpy for AVX512 platforms
>   app/test: Adjust alignment unit for memcpy perf test
>   lib/librte_eal: Tune memcpy for prior platforms
> 
>  app/test/test_memcpy_perf.c|   6 +
>  .../common/include/arch/x86/rte_cpuflags.h |   2 +
>  .../common/include/arch/x86/rte_memcpy.h   | 269 
> -
>  mk/rte.cpuflags.mk |   4 +
>  4 files changed, 268 insertions(+), 13 deletions(-)

The maintainers of arch/x86 are Bruce and Konstantin.
I guess there is no comment and we can apply this cool series?


[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-27 Thread Santosh Shukla
On Wed, Jan 27, 2016 at 4:11 PM, Santosh Shukla  wrote:
> On Tue, Jan 26, 2016 at 9:51 PM, Santosh Shukla  wrote:
>> On Tue, Jan 26, 2016 at 7:58 PM, Thomas Monjalon
>>  wrote:
>>> 2016-01-26 19:35, Santosh Shukla:
 On Tue, Jan 26, 2016 at 6:30 PM, Thomas Monjalon
  wrote:
 > 2016-01-26 15:56, Santosh Shukla:
 >> In my observation, currently virtio work for vfio-noiommu, that's why
 >> said drv->kdrv need to know vfio mode.
 >
 > It is your observation. It may change in near future.

 so that mean till then, virtio support for non-x86 arch has to wait?
>>>
>>> No, absolutely not. virtio for non-x86 is welcome.
>>>
 We have working model with vfio-noiommu, don't you think it make sense
 to let vfio_noiommu implementation exist and later in-case
 virtio+iommu gets mainline then switch to vfio __mode__ agnostic
 approach. And for that All it takes to replace __noiommu suffix with
 default.
>>>
>>> I'm just saying you should not touch the enum rte_kernel_driver.
>>> RTE_KDRV_VFIO is a driver.
>>> RTE_KDRV_VFIO_NOIOMMU is a mode.
>>> As the VFIO API is the same in both modes, there is no reason to
>>> distinguish them at this level.
>>> Your patch adds the NOIOMMU case everywhere:
>>> case RTE_KDRV_VFIO:
>>> +   case RTE_KDRV_VFIO_NOIOMMU:
>>>
>>> I'll stop commenting here to let others give their opinion.
>>>
>>> [...]
 >> with vfio+iommu; binding virtio pci device to vfio-pci driver fail;
 >> giving below error:
 >> [   53.053464] VFIO - User Level meta-driver version: 0.3
 >> [   73.077805] vfio-pci: probe of :00:03.0 failed with error -22
 >> [   73.077852] vfio-pci: probe of :00:03.0 failed with error -22
 >>
 >> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get()
 >> fails: iommu doesn't have group for virtio pci device.
 >
 > Yes it fails when binding.
 > So the later check in the virtio PMD is useless.

 Which check?
>>>
>>> The check for VFIO noiommu only:
>>> -   if (dev->kdrv == RTE_KDRV_VFIO)
>>> +   if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU)
>>>
>>> [...]
 > Furthermore restricting virtio to no-iommu mode doesn't bring
 > any improvement.

 We're not __restricting__, as soon as virtio+iommu gets working state,
 we'll simply replace __noiommu with default. Then its upto user to try
 out virtio with vfio default or vfio_noiommu.
>>>
>>> Yes it's up to user.
>>> So your code should be
>>> if (dev->kdrv == RTE_KDRV_VFIO)
>>>
>>
>> Right,
>>
 > That's why I suggest to keep the initial semantic of kdrv and
 > not pollute it with VFIO modes.

 I am okay to live with default and forget suffix __noiommu but there
 are implementation problem which was discussed in other thread
 - Virtio pmd driver should avoid interface parsing i.e.
 virtio_resource_init_uio/vfio() etc.. For vfio case - We could easily
 get rid of by moving /sys parsing to pci_eal layer, Right? If so then
 virtio currently works with vfio-noiommu, it make sense to me that
 pci_eal layer does parsing for pmd driver before that pmd driver get
 initialized.
>>>
>>> Please reword. What is the problem?
>>>
 - Another case could be: iommu-less-pmd-driver. eal layer to do
 parsing before updating drv->kdrv.
>>>
>>> [...]
 >> >> > If a check is needed, I would prefer using your function
 >> >> > pci_vfio_is_noiommu() and remove driver modes from struct 
 >> >> > rte_kernel_driver.
 >> >>
 >> >> I don't think calling pci_vfio_no_iommu() inside
 >> >> virtio_reg_rd/wr_1/2/3() would be a good idea.
 >> >
 >> > Why? The value may be cached in the priv properties.
 >> >
 >> pci_vfio_is_noiommu() parses /sys for
 >> - enable_noiommu param
 >> - attached driver name is vfio-noiommu or not.
 >>
 >> It does file operation for that, I meant to say that calling this api
 >> within register_rd/wr function is not correct. It would be better if
 >> those low level register_rd/wr api only checks driver_types.
 >
 > Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached
 > to keep efficiency.

 I am not convinced though, Still find pmd driver checking driver_types
 using drv->kdrv is better approach than introducing a new global
 variable which may look something like;
>>>
>>> Not a global variable. A function in EAL layer. A variable in PMD priv.
>>>
>>
>> If we agreed to use condition (drv->kdrv == RTE_KDRV_VFIO);
>> then resource parsing for vfio {including vfio and vfio_noiommu both
>> case} is enforced in virtio pmd driver layer and that is contradicting
>> to what we agreed earlier in this[1] thread. Also we don't need a
>> function in EAL layer or a variable in PMD priv. Perhaps a private
>> function in virtio pmd which does parsing for vfio interface.
>>
>> Thoughts?
>>
>> [1] http://dpdk.org/dev/patchwork/patch/9862/
>>
>

[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-27 Thread Thomas Monjalon
2016-01-27 21:02, Santosh Shukla:
> 1. virtio currently works for vfio+noiommu and likely will work for
> vfio+iommu in near future.
> 2. So remove __noiommu suffix and always use default.
> 3. Introduce vfio resource parsing global function, That function
> suppose to do parsing for default vfio case and for vfio-noiommu case.
> This function will be used by pmd drivers for resource parsing purpose
> example virtio.
> 
> Yuan won't be happy with 3) I guess, because he wanted to get rid of
> interface parsing from pmd driver.
> 
> Thomas, if 1/2/3/ addresses your concern then I'll spin the series,

I agree with 1/ and 2/.
Please, could you explain why 3/ is needed?


[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode

2016-01-27 Thread Thomas Monjalon
2016-01-27 14:32, Anatoly Burakov:
> +/* DMA mapping function prototype.
> + * Takes VFIO container fd as a parameter.
> + * Returns 0 on success, -1 on error.
> + * */
> +typedef  int (*vfio_dma_func_t)(int);
> +
> +struct vfio_iommu_type {
> + int type_id;
> + const char *name;
> + vfio_dma_func_t dma_map_func;
> +};
> +
> +int vfio_iommu_type1_dma_map(int);
> +int vfio_iommu_noiommu_dma_map(int);

Is it possible (is it better) to declare these functions
with vfio_dma_func_t?

vfio_iommu_noiommu_dma_map is a weird name.
Why not vfio_noiommu_dma_map or vfio_iommu_none_dma_map?



[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

2016-01-27 Thread Santosh Shukla
On Wed, Jan 27, 2016 at 9:09 PM, Thomas Monjalon
 wrote:
> 2016-01-27 21:02, Santosh Shukla:
>> 1. virtio currently works for vfio+noiommu and likely will work for
>> vfio+iommu in near future.
>> 2. So remove __noiommu suffix and always use default.
>> 3. Introduce vfio resource parsing global function, That function
>> suppose to do parsing for default vfio case and for vfio-noiommu case.
>> This function will be used by pmd drivers for resource parsing purpose
>> example virtio.
>>
>> Yuan won't be happy with 3) I guess, because he wanted to get rid of
>> interface parsing from pmd driver.
>>
>> Thomas, if 1/2/3/ addresses your concern then I'll spin the series,
>
> I agree with 1/ and 2/.
> Please, could you explain why 3/ is needed?

Because someone should do resource parsing / validation before driver
does resource mapping/initialization. That someone could be either EAL
layer or driver itself.

In my case;
- driver is virtio
- resource is vfio interface


[dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job

2016-01-27 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, January 27, 2016 2:38 PM
> To: Kerlin, MarcinX ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH V1 1/1] jobstats: added function abort for job
> 
> On 01/26/2016 06:15 PM, Marcin Kerlin wrote:
> > This patch adds new function rte_jobstats_abort. It marks *job* as finished
> > and time of this work will be add to management time instead of execution
> time.
> > This function should be used instead of rte_jobstats_finish if condition
> occure,
> > condition is defined by the application for example when receiving n>0
> packets.
> >
> > Signed-off-by: Marcin Kerlin 
> > ---
> >   lib/librte_jobstats/rte_jobstats.c   | 22 ++
> >   lib/librte_jobstats/rte_jobstats.h   | 17 +
> >   lib/librte_jobstats/rte_jobstats_version.map |  7 +++
> >   3 files changed, 46 insertions(+)
> >
> [...]
> > diff --git a/lib/librte_jobstats/rte_jobstats.h
> b/lib/librte_jobstats/rte_jobstats.h
> > index de6a89a..9995319 100644
> > --- a/lib/librte_jobstats/rte_jobstats.h
> > +++ b/lib/librte_jobstats/rte_jobstats.h
> > @@ -90,6 +90,9 @@ struct rte_jobstats {
> > uint64_t exec_cnt;
> > /**< Execute count. */
> >
> > +   uint64_t last_job_time;
> > +   /**< Last job time */
> > +
> > char name[RTE_JOBSTATS_NAMESIZE];
> > /**< Name of this job */
> >
> 
> AFAICS this is an ABI break and as such, needs to be preannounced, see
> http://dpdk.org/doc/guides/contributing/versioning.html
> For 2.3 it'd need to be a CONFIG_RTE_NEXT_ABI feature.
> 
>   - Panu -

Hi Panu,
Thanks for Your notice. This last_job_time field is actually not necessary here
and will be removed from this structure.

Best regards
Michal 


[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment

2016-01-27 Thread Xie, Huawei
On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:
[snip]
> +
> +static int
> +qtest_raw_recv(int fd, char *buf, size_t count)
> +{
> + size_t len = count;
> + size_t total_len = 0;
> + int ret = 0;
> +
> + while (len > 0) {
> + ret = read(fd, buf, len);
> + if (ret == (int)len)
> + break;
> + if (*(buf + ret - 1) == '\n')
> + break;

The above two lines should be put after the below if block.

> + if (ret == -1) {
> + if (errno == EINTR)
> + continue;
> + return ret;
> + }
> + total_len += ret;
> + buf += ret;
> + len -= ret;
> + }
> + return total_len + ret;
> +}
> +

[snip]

> +
> +static void
> +qtest_handle_one_message(struct qtest_session *s, char *buf)
> +{
> + int ret;
> +
> + if (strncmp(buf, interrupt_message, strlen(interrupt_message)) == 0) {
> + if (rte_atomic16_read(&s->enable_intr) == 0)
> + return;
> +
> + /* relay interrupt to pipe */
> + ret = write(s->irqfds.writefd, "1", 1);
> + if (ret < 0)
> + rte_panic("cannot relay interrupt\n");
> + } else {
> + /* relay normal message to pipe */
> + ret = qtest_raw_send(s->msgfds.writefd, buf, strlen(buf));
> + if (ret < 0)
> + rte_panic("cannot relay normal message\n");
> + }
> +}
> +
> +static char *
> +qtest_get_next_message(char *p)
> +{
> + p = strchr(p, '\n');
> + if ((p == NULL) || (*(p + 1) == '\0'))
> + return NULL;
> + return p + 1;
> +}
> +
> +static void
> +qtest_close_one_socket(int *fd)
> +{
> + if (*fd > 0) {
> + close(*fd);
> + *fd = -1;
> + }
> +}
> +
> +static void
> +qtest_close_sockets(struct qtest_session *s)
> +{
> + qtest_close_one_socket(&s->qtest_socket);
> + qtest_close_one_socket(&s->msgfds.readfd);
> + qtest_close_one_socket(&s->msgfds.writefd);
> + qtest_close_one_socket(&s->irqfds.readfd);
> + qtest_close_one_socket(&s->irqfds.writefd);
> + qtest_close_one_socket(&s->ivshmem_socket);
> +}
> +
> +/*
> + * This thread relays QTest response using pipe.
> + * The function is needed because we need to separate IRQ message from 
> others.
> + */
> +static void *
> +qtest_event_handler(void *data) {
> + struct qtest_session *s = (struct qtest_session *)data;
> + char buf[1024];
> + char *p;
> + int ret;
> +
> + for (;;) {
> + memset(buf, 0, sizeof(buf));
> + ret = qtest_raw_recv(s->qtest_socket, buf, sizeof(buf));
> + if (ret < 0) {
> + qtest_close_sockets(s);
> + return NULL;
> + }
> +
> + /* may receive multiple messages at the same time */

>From the qtest_raw_recv implementation, if at some point one message is
received by two qtest_raw_recv calls, then is that message discarded?
We could save the last incomplete message in buffer, and combine the
message received next time together.

> + p = buf;
> + do {
> + qtest_handle_one_message(s, p);
> + } while ((p = qtest_get_next_message(p)) != NULL);
> + }
> + return NULL;
> +}
> +



[dpdk-dev] bnx2x driver and 57800 versus 57810

2016-01-27 Thread Chas Williams
On Wed, 2016-01-27 at 07:32 +, Harish Patil wrote:
> >
> >I have to practically identical systems, same hypervisor on each
> (Centos
> >7.x).??In one, I have a 57800 card which works fine with DPDK with
> >SRIOV.??In the other, I have a 57810 card which doesn't work with
> SRIOV.
> >
> >For the 57810 I have tracked this down to the status block in the VF
> >failing to be updated.??The linux driver works fine but it appears to
> >use a slightly different scheme -- writing some sort of fastpath
> status
> >block generation per interrupt.
> >
> >Does anyone have any suggestions or a programming guide for this
> device?
> >
> >
>?
> What is not working with 57810? Is it link related or traffic? Please
> provide the details.
> Attached is the SW programming guide for 577xx/578xx. I?m not sure if
> it has details pertaining to the specific issue that you have.

The DPDK PMD driver seems to be able to transmit packets on the 57810.
But since the status block isn't getting updated, you can't reclaim the
sent buffers.??I modified the driver to use the marker based receive
detection (similar to the method used in the Linux driver) and I can see
packets getting received (certainly broadcast is received -- possibly
not unicast packets though which seems to indicate that part of the
RX path is possibly still broken).

I have tried a couple things.??The status page in the DPDK PMD driver
isn't getting page aligned (as well as a bunch of other structures
that should probably be page aligned). The Linux driver happens to do
this as a side effect of the DMA allocator.??Fixing this didn't seem to
improve matters though.??The status block doesn't seem to get updated.
I verified that the correct DMA address is getting passed to the PF.
And since it works on the 57800, I thought perhaps something changed.

Also, the DPDK driver probably gets the RX/TX queue indices wrong during
initial setup.??The final values coming out of the allocation loop
are probably bigger than they should be.??Should they point to the end
of the queue or just past the end???Also, the tail of the queue needs
to be corrected for the double entry at the end of the pages.??Again,
fixing this didn't seem to help either.

The VF-PF interaction seems to be ok as well.??Other than not supporting
SGE, the DPDK PMD driver seems to send reasonably correct messages to
the PF.

I don't see the DPDK PMD driver doing anything to 'reset' the PCI apsect
of the VF.??If there is any left over configuration for interrupts,
like leaving the IGU enabled that maybe not be cleared, I am not sure
what the interaction might be.??I do know the Linux driver does seem
to use MSI-X interrupts.

> Thanks,
> Harish

Thanks for looking at this and thanks for the programming guide.??It
will
take me a bit to digest it.


[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode

2016-01-27 Thread Burakov, Anatoly
Hi Thomas,

> > +/* DMA mapping function prototype.
> > + * Takes VFIO container fd as a parameter.
> > + * Returns 0 on success, -1 on error.
> > + * */
> > +typedef  int (*vfio_dma_func_t)(int);
> > +
> > +struct vfio_iommu_type {
> > +   int type_id;
> > +   const char *name;
> > +   vfio_dma_func_t dma_map_func;
> > +};
> > +
> > +int vfio_iommu_type1_dma_map(int);
> > +int vfio_iommu_noiommu_dma_map(int);
> 
> Is it possible (is it better) to declare these functions with vfio_dma_func_t?

Yeah, sure. Or maybe the other way around - maybe we could do away with the 
typedef. I'll go for the former though.

> vfio_iommu_noiommu_dma_map is a weird name.
> Why not vfio_noiommu_dma_map or vfio_iommu_none_dma_map?

Well, the NOIOMMU type is named VFIO_IOMMU_NOIOMMU in the VFIO headers. So it's 
consistent with the IOMMU type name. Although vfio_noiommu_dma_map seems 
reasonable.

Thanks,
Anatoly


[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Inter-VM communication & IP allocation through DHCP issue

2016-01-27 Thread Abhijeet Karve
Hi Przemek,

Thanks for the quick response. Now  able to get the DHCP ip's for 2 
vhostuser instances and able to ping each other. Isssue was a bug in 
cirros 0.3.0 images which we were using in openstack after using 0.3.1 
image as given in the URL(
https://www.redhat.com/archives/rhos-list/2013-August/msg00032.html), able 
to get the IP's in vhostuser VM instances.

As per our understanding, Packet flow across DPDK datapath will be like 
vhostuser ports are connected to the br-int bridge & same is being patched 
to the br-dpdk bridge where in our physical network (NIC) is connected 
with dpdk0 port. 

So for testing the flow we have to connect that physical network(NIC) with 
external packet generator (e.g - ixia, iperf) & run the testpmd 
application in the vhostuser VM, right?

Does it required to add any flows/efforts in bridge configurations(either 
br-int or br-dpdk)?


Thanks & Regards
Abhijeet Karve




From:   "Czesnowicz, Przemyslaw" 
To: Abhijeet Karve 
Cc: "dev at dpdk.org" , "discuss at openvswitch.org" 
, "Gray, Mark D" 
Date:   01/27/2016 05:11 PM
Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Inter-VM communication & IP allocation through DHCP issue



Hi Abhijeet,


It seems you are almost there! 
When booting the VM?s do you request hugepage memory for them (by setting 
hw:mem_page_size=large in flavor extra_spec)?
If not then please do, if yes then please look into libvirt logfiles for 
the VM?s (in /var/log/libvirt/qemu/instance-xxx), I think there could be a 
clue.


Regards
Przemek

From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] 
Sent: Monday, January 25, 2016 6:13 PM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Inter-VM communication & IP allocation through DHCP issue

Hi Przemek, 

Thank you for your response, It really provided us breakthrough. 

After setting up DPDK on compute node for stable/kilo, We are trying to 
set up Openstack stable/liberty all-in-one setup, At present we are not 
able to get the IP allocation for the vhost type instances through DHCP. 
Also we tried assigning IP's manually to them but the inter-VM 
communication also not happening, 

#neutron agent-list 
root at nfv-dpdk-devstack:/etc/neutron# neutron agent-list 
+--++---+---++---+
 

| id   | agent_type | host  | 
alive | admin_state_up | binary| 
+--++---+---++---+
 

| 3b29e93c-3a25-4f7d-bf6c-6bb309db5ec0 | DPDK OVS Agent | 
nfv-dpdk-devstack | :-)   | True   | neutron-openvswitch-agent | 
| 62593b2c-c10f-4d93-8551-c46ce24895a6 | L3 agent   | 
nfv-dpdk-devstack | :-)   | True   | neutron-l3-agent  | 
| 7cb97af9-cc20-41f8-90fb-aba97d39dfbd | DHCP agent | 
nfv-dpdk-devstack | :-)   | True   | neutron-dhcp-agent| 
| b613c654-99b7-437e-9317-20fa651a1310 | Linux bridge agent | 
nfv-dpdk-devstack | :-)   | True   | neutron-linuxbridge-agent | 
| c2dd0384-6517-4b44-9c25-0d2825d23f57 | Metadata agent | 
nfv-dpdk-devstack | :-)   | True   | neutron-metadata-agent| 
| f23dde40-7dc0-4f20-8b3e-eb90ddb15e49 | Open vSwitch agent | 
nfv-dpdk-devstack | xxx   | True   | neutron-openvswitch-agent | 
+--++---+---++---+
 



ovs-vsctl show output# 
 
Bridge br-dpdk 
Port br-dpdk 
Interface br-dpdk 
type: internal 
Port phy-br-dpdk 
Interface phy-br-dpdk 
type: patch 
options: {peer=int-br-dpdk} 
Bridge br-int 
fail_mode: secure 
Port "vhufa41e799-f2" 
tag: 5 
Interface "vhufa41e799-f2" 
type: dpdkvhostuser 
Port int-br-dpdk 
Interface int-br-dpdk 
type: patch 
options: {peer=phy-br-dpdk} 
Port "tap4e19f8e1-59" 
tag: 5 
Interface "tap4e19f8e1-59" 
type: internal 
Port "vhu05734c49-3b" 
tag: 5 
Interface "vhu05734c49-3b" 
type: dpdkvhostuser 
Port "vhu10c06b4d-84" 
tag: 5 
Interface "vhu10c06b4d-84" 
type: dpdkvhostuser 
Port patch-tun 
Interface patch-tun 
type: patch 
options: {peer=patch-int} 
Port "vhue169c581-ef" 
tag: 5 
Interface "vhue169c581-ef" 
type: dpdkvhostuser 
P

[dpdk-dev] [PATCH 0/3] Use common Linux tools to control DPDK ports

2016-01-27 Thread Ferruh Yigit
This work is to make DPDK ports more visible and to enable using common
Linux tools to configure DPDK ports.

Patch is based on KNI but contains only control functionality of it,
also this patch does not include any Linux kernel network driver as
part of it.

Basically with the help of a kernel module (KCP), virtual Linux network
interfaces named as "dpdk$" are created per DPDK port, control messages
sent to these virtual interfaces are forwarded to DPDK, and response
sent back to Linux application.

Virtual interfaces created when DPDK application started and destroyed
automatically when DPDK application terminated.

Communication between kernel-space and DPDK done using netlink socket.

Currently implementation is not complete, sample support added for the
RFC, more functionality can be added based on community response.

With this RFC Patch, supported: get/set mac address/mtu of DPDK devices,
getting stats from DPDK devices and some set of ethtool commands.

In long term this patch intends to replace the KNI and KNI will be
depreciated.

Samples:

$ ifconfig
dpdk0: flags=4099  mtu 1500
ether 90:e2:ba:0e:49:b8  txqueuelen 1000  (Ethernet)
RX packets 33  bytes 2058 (2.0 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 33  bytes 2058 (2.0 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

dpdk1: flags=4099  mtu 1500
ether 00:1b:21:76:fa:21  txqueuelen 1000  (Ethernet)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

After some traffic on port 0:

$ ifconfig
dpdk0: flags=4099  mtu 1500
ether 90:e2:ba:0e:49:77  txqueuelen 1000  (Ethernet)
RX packets 962  bytes 57798 (56.4 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 962  bytes 57798 (56.4 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


$ ethtool -i dpdk0
driver: rte_ixgbe_pmd
version: RTE 2.3.0-rc0
firmware-version: 
expansion-rom-version: 
bus-info: :08:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


$ ip l show dpdk0
25: dpdk0:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
link/ether 90:e2:ba:0e:49:b8 brd ff:ff:ff:ff:ff:ff

$ ip l set dpdk0 addr 90:e2:ba:0e:49:77

$ ip l show dpdk0
25: dpdk0:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
link/ether 90:e2:ba:0e:49:77 brd ff:ff:ff:ff:ff:ff

Ferruh Yigit (3):
  kcp: add kernel control path kernel module
  rte_ctrl_if: add control interface library
  examples/ethtool: add control interface support to the application

 config/common_linuxapp |   9 +-
 doc/api/doxy-api-index.md  |   3 +-
 doc/api/doxy-api.conf  |   1 +
 doc/guides/rel_notes/release_2_3.rst   |   9 +
 doc/guides/sample_app_ug/ethtool.rst   |  41 +++
 examples/ethtool/ethtool-app/main.c|  10 +-
 lib/Makefile   |   3 +-
 lib/librte_ctrl_if/Makefile|  58 
 lib/librte_ctrl_if/rte_ctrl_if.c   | 162 ++
 lib/librte_ctrl_if/rte_ctrl_if.h   | 115 +++
 lib/librte_ctrl_if/rte_ctrl_if_version.map |   9 +
 lib/librte_ctrl_if/rte_ethtool.c   | 354 +
 lib/librte_ctrl_if/rte_ethtool.h   |  54 
 lib/librte_ctrl_if/rte_nl.c| 259 +++
 lib/librte_ctrl_if/rte_nl.h|  60 
 lib/librte_eal/common/include/rte_log.h|   3 +-
 lib/librte_eal/linuxapp/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   3 +-
 .../linuxapp/eal/include/exec-env/rte_kcp_common.h |  86 +
 lib/librte_eal/linuxapp/kcp/Makefile   |  58 
 lib/librte_eal/linuxapp/kcp/kcp_dev.h  |  65 
 lib/librte_eal/linuxapp/kcp/kcp_ethtool.c  | 261 +++
 lib/librte_eal/linuxapp/kcp/kcp_misc.c | 282 
 lib/librte_eal/linuxapp/kcp/kcp_net.c  | 209 
 lib/librte_eal/linuxapp/kcp/kcp_nl.c   | 194 +++
 mk/rte.app.mk  |   3 +-
 26 files changed, 2307 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ctrl_if/Makefile
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.c
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.h
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if_version.map
 create mode 100644 lib/librte_ctrl_if/rte_ethtool.c
 create mode 100644 lib/librte_ctrl_if/rte_ethtool.h
 create mode 100644 lib/librte_ctrl_if/rte_nl.c
 create mode 100644 lib/librte_ctrl_if/rte_nl.h
 create mode 100644 
lib/librte_eal/linuxapp/eal

[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-01-27 Thread Ferruh Yigit
This kernel module is based on KNI module, but this one is stripped
version of it and only for control messages, no data transfer
functionality provided.

This Linux kernel module helps userspace application create virtual
interfaces and when a control command issued into that virtual
interface, module pushes the command to the userspace and gets the
response back for the caller application.

The Linux tools like ethtool/ifconfig/ip can be used on virtual
interfaces but not ones for related data, like tcpdump.

In long term this patch intends to replace the KNI and KNI will be
depreciated.

Signed-off-by: Ferruh Yigit 
---
 config/common_linuxapp |   6 +
 lib/librte_eal/linuxapp/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   3 +-
 .../linuxapp/eal/include/exec-env/rte_kcp_common.h |  86 +++
 lib/librte_eal/linuxapp/kcp/Makefile   |  58 +
 lib/librte_eal/linuxapp/kcp/kcp_dev.h  |  65 +
 lib/librte_eal/linuxapp/kcp/kcp_ethtool.c  | 261 +++
 lib/librte_eal/linuxapp/kcp/kcp_misc.c | 282 +
 lib/librte_eal/linuxapp/kcp/kcp_net.c  | 209 +++
 lib/librte_eal/linuxapp/kcp/kcp_nl.c   | 194 ++
 10 files changed, 1167 insertions(+), 2 deletions(-)
 create mode 100644 
lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h
 create mode 100644 lib/librte_eal/linuxapp/kcp/Makefile
 create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_dev.h
 create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_ethtool.c
 create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_misc.c
 create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_net.c
 create mode 100644 lib/librte_eal/linuxapp/kcp/kcp_nl.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..5d5e3e4 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -503,6 +503,12 @@ CONFIG_RTE_KNI_VHOST_DEBUG_RX=n
 CONFIG_RTE_KNI_VHOST_DEBUG_TX=n

 #
+# Compile librte_ctrl_if
+#
+CONFIG_RTE_KCP_KMOD=y
+CONFIG_RTE_KCP_KO_DEBUG=n
+
+#
 # Compile vhost library
 # fuse-devel is needed to run vhost-cuse.
 # fuse-devel enables user space char driver development
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index d9c5233..d1fa3a3 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -38,6 +38,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_KNI_KMOD),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
 endif
+ifeq ($(CONFIG_RTE_KCP_KMOD),y)
+DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kcp
+endif
 ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0
 endif
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..dded8cb 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -116,6 +116,7 @@ CFLAGS_eal_thread.o += -Wno-return-type
 endif

 INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h
+INC += rte_kcp_common.h

 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP)-include/exec-env := \
$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h
new file mode 100644
index 000..b3a6ee3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kcp_common.h
@@ -0,0 +1,86 @@
+/*-
+ *   This file is provided under a dual BSD/LGPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GNU LESSER GENERAL PUBLIC LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2.1 of the GNU Lesser General Public License
+ *   as published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   Lesser General Public License for more details.
+ *
+ *   You should have received a copy of the GNU Lesser General Public License
+ *   along with this program;
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ *
+ *
+ *  

[dpdk-dev] [PATCH 2/3] rte_ctrl_if: add control interface library

2016-01-27 Thread Ferruh Yigit
This library gets control messages form kernelspace and forwards them to
librte_ether and returns response back to the kernelspace.

Library does:
1) Trigger Linux virtual interface creation
2) Initialize the netlink socket communication
3) Provides process() API to the application that does processing the
received messages

This library requires corresponding kernel module to be inserted.

Signed-off-by: Ferruh Yigit 
---
 config/common_linuxapp |   3 +-
 doc/api/doxy-api-index.md  |   3 +-
 doc/api/doxy-api.conf  |   1 +
 doc/guides/rel_notes/release_2_3.rst   |   9 +
 lib/Makefile   |   3 +-
 lib/librte_ctrl_if/Makefile|  58 +
 lib/librte_ctrl_if/rte_ctrl_if.c   | 162 +
 lib/librte_ctrl_if/rte_ctrl_if.h   | 115 ++
 lib/librte_ctrl_if/rte_ctrl_if_version.map |   9 +
 lib/librte_ctrl_if/rte_ethtool.c   | 354 +
 lib/librte_ctrl_if/rte_ethtool.h   |  54 +
 lib/librte_ctrl_if/rte_nl.c| 259 +
 lib/librte_ctrl_if/rte_nl.h|  60 +
 lib/librte_eal/common/include/rte_log.h|   3 +-
 mk/rte.app.mk  |   3 +-
 15 files changed, 1091 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_ctrl_if/Makefile
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.c
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if.h
 create mode 100644 lib/librte_ctrl_if/rte_ctrl_if_version.map
 create mode 100644 lib/librte_ctrl_if/rte_ethtool.c
 create mode 100644 lib/librte_ctrl_if/rte_ethtool.h
 create mode 100644 lib/librte_ctrl_if/rte_nl.c
 create mode 100644 lib/librte_ctrl_if/rte_nl.h

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5d5e3e4..f72ba0e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -507,6 +507,7 @@ CONFIG_RTE_KNI_VHOST_DEBUG_TX=n
 #
 CONFIG_RTE_KCP_KMOD=y
 CONFIG_RTE_KCP_KO_DEBUG=n
+CONFIG_RTE_LIBRTE_CTRL_IF=y

 #
 # Compile vhost library
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 7a91001..214d16e 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -149,4 +149,5 @@ There are many libraries, so their headers may be grouped 
by topics:
   [common] (@ref rte_common.h),
   [ABI compat] (@ref rte_compat.h),
   [keepalive]  (@ref rte_keepalive.h),
-  [version](@ref rte_version.h)
+  [version](@ref rte_version.h),
+  [control interface]  (@ref rte_ctrl_if.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 57e8b5d..fd69bf1 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -39,6 +39,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_cmdline \
   lib/librte_compat \
   lib/librte_cryptodev \
+  lib/librte_ctrl_if \
   lib/librte_distributor \
   lib/librte_ether \
   lib/librte_hash \
diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..39725e4 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,14 @@ DPDK Release 2.3
 New Features
 

+* **Control interface support added.**
+
+  To enable controlling DPDK ports by common Linux tools.
+  Following modules added to DPDK:
+
+  * librte_ctrl_if library
+  * librte_eal/linuxapp/kcp kernel module
+

 Resolved Issues
 ---
@@ -51,6 +59,7 @@ The libraries prepended with a plus sign were incremented in 
this version.
  librte_acl.so.2
  librte_cfgfile.so.2
  librte_cmdline.so.1
+   + librte_ctrl_if.so.1
  librte_distributor.so.1
  librte_eal.so.2
  librte_hash.so.2
diff --git a/lib/Makefile b/lib/Makefile
index ef172ea..a50bc1e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
 DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
+DIRS-$(CONFIG_RTE_LIBRTE_CTRL_IF) += librte_ctrl_if

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_ctrl_if/Makefile

[dpdk-dev] [PATCH 3/3] examples/ethtool: add control interface support to the application

2016-01-27 Thread Ferruh Yigit
Control interface APIs added into the sample application.

To have the support corresponding kernel module (KCP) needs to be inserted.
If kernel module is not there, application will run as it is without
kernel control path support.

When KCP module inserted, running application creates a virtual Linux
network interface (dpdk$) per DPDK port. This interface can be used by
traditional Linux tools.

Signed-off-by: Ferruh Yigit 
---
 doc/guides/sample_app_ug/ethtool.rst | 41 
 examples/ethtool/ethtool-app/main.c  | 10 +++--
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/doc/guides/sample_app_ug/ethtool.rst 
b/doc/guides/sample_app_ug/ethtool.rst
index 4d1697e..2174288 100644
--- a/doc/guides/sample_app_ug/ethtool.rst
+++ b/doc/guides/sample_app_ug/ethtool.rst
@@ -131,6 +131,47 @@ application`_. Individual call-back functions handle the 
detail
 associated with each command, which make use of the functions
 defined in the `Ethtool interface`_ to the DPDK functions.

+Control Interface
+~
+
+If Kernel Control Path (KCP) kernel module (rte_kcp.ko) inserted,
+virtual interfaces created for each DPDK port for control purposes.
+
+Created interfaces are named as dpdk#, like:
+
+.. code-block:: console
+
+# ifconfig dpdk0; ifconfig dpdk1
+dpdk0: flags=4099  mtu 1500
+ether 90:e2:ba:0e:49:b9  txqueuelen 1000  (Ethernet)
+RX packets 0  bytes 0 (0.0 B)
+RX errors 0  dropped 0  overruns 0  frame 0
+TX packets 0  bytes 0 (0.0 B)
+TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
+
+dpdk1: flags=4099  mtu 1500
+ether 00:1b:21:76:fa:21  txqueuelen 1000  (Ethernet)
+RX packets 0  bytes 0 (0.0 B)
+RX errors 0  dropped 0  overruns 0  frame 0
+TX packets 0  bytes 0 (0.0 B)
+TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
+
+Regular Linux commands can be issued on interfaces:
+
+.. code-block:: console
+
+# ethtool -i dpdk0
+driver: rte_ixgbe_pmd
+version: RTE 2.3.0-rc0
+firmware-version:
+expansion-rom-version:
+bus-info: :08:00.1
+supports-statistics: yes
+supports-test: no
+supports-eeprom-access: yes
+supports-register-dump: yes
+supports-priv-flags: no
+
 Ethtool interface
 -

diff --git a/examples/ethtool/ethtool-app/main.c 
b/examples/ethtool/ethtool-app/main.c
index e21abcd..68b13ad 100644
--- a/examples/ethtool/ethtool-app/main.c
+++ b/examples/ethtool/ethtool-app/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "ethapp.h"

@@ -54,7 +55,6 @@
 #define PKTPOOL_EXTRA_SIZE 512
 #define PKTPOOL_CACHE 32

-
 struct txq_port {
uint16_t cnt_unsent;
struct rte_mbuf *buf_frames[MAX_BURST_LENGTH];
@@ -254,6 +254,8 @@ static int slave_main(__attribute__((unused)) void 
*ptr_data)
}
rte_spinlock_unlock(&ptr_port->lock);
} /* end for( idx_port ) */
+   rte_eth_control_interface_process_msg(
+   RTE_ETHTOOL_CTRL_IF_PROCESS_MSG, 0);
} /* end for(;;) */

return 0;
@@ -293,6 +295,8 @@ int main(int argc, char **argv)
id_core = rte_get_next_lcore(id_core, 1, 1);
rte_eal_remote_launch(slave_main, NULL, id_core);

+   rte_eth_control_interface_create();
+
ethapp_main();

app_cfg.exit_now = 1;
@@ -301,5 +305,7 @@ int main(int argc, char **argv)
return -1;
}

+   rte_eth_control_interface_destroy();
+
return 0;
 }
-- 
2.5.0



[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode

2016-01-27 Thread Burakov, Anatoly
Hi Thomas,

> > Is it possible (is it better) to declare these functions with 
> > vfio_dma_func_t?
> 
> Yeah, sure. Or maybe the other way around - maybe we could do away with
> the typedef. I'll go for the former though.

No, we can't declare the functions with a function pointer. At least I don't 
see any obvious way to do that without incurring multiple declarations compile 
error. So I'll leave it as forward declarations. Of course, the other 
alternative is to put the array below the functions and make them static, to 
avoid forward declarations, but I think it's much clearer the way it is now.

Thanks,
Anatoly


[dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux

2016-01-27 Thread Ferruh Yigit
This is slow data path communication implementation based on existing KNI.

Difference is: librte_kni converted into a PMD, kdp kernel module is almost
same except all control path functionality removed and some simplification done.

Motivation is to simplify slow path data communication.
Now any application can use this new PMD to send/get data to Linux kernel.

PMD supports two communication methods:

1) KDP kernel module
PMD initialization functions handles creating virtual interfaces (with help of
kdp kernel module) and created FIFO. FIFO is used to share data between
userspace and kernelspace. This is default method.

2) tun/tap module
When KDP module is not inserted, PMD creates tap interface and transfers
packets using tap interface.

In long term this patch intends to replace the KNI and KNI will be
depreciated.

Sample usage:
1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel

a) insert kdp kernel module
insmod build/kmod/rte_kdp.ko

b) bind NIC to the DPDK using dpdk_nic_bind.py

c) ./testpmd --vdev eth_kdp0

c1) testpmd show two ports, one of them physical, other virtual
...
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:00
Configuring Port 1 (socket 0)
...
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex
Done

c2) This will create "kdp0" Linux interface
$ ip l show kdp0
21: kdp0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

d) Linux port can be used for data

d1)
$ ifconfig kdp0 1.0.0.2
$ ping 1.0.0.1
PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data.
64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms
64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms

d2)
$ tcpdump -nn -i kdp0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, 
length 64
15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 
64



2) Data travels between virtual Linux interfaces pass from DPDK application,
application can alter data

a) insert kdp kernel module
insmod build/kmod/rte_kdp.ko

b) No physical NIC involved

c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1

c1) testpmd show two ports, both of them are virtual
...
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:00
Configuring Port 1 (socket 0)
Port 1: 00:00:00:00:00:00
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex
Done

c2) This will create "kdp0"  and "kdp1" Linux interfaces
$ ip l show kdp0; ip l show kdp1
22: kdp0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
23: kdp1:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

d) Data travel between virtual ports pass from DPDK application
$ifconfig kdp0 1.0.0.1
$ifconfig kdp1 1.0.0.2

d1)
$ ping 1.0.0.1
PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data.
64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms
64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms
64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms

d2)
$ tcpdump -nn -i kdp0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, 
length 64
15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 
64
15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, 
length 64
15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 
64



3) tun/tap interface usage

a) No external module required, tun/tap support in kernel required

b) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1

b1) This will create "tap_kdp0"  and "tap_kdp1" Linux interfaces
$ ip l show tap_kdp0; ip l show tap_kdp1
25: tap_kdp0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 500
link/ether 56:47:97:9c:03:8e brd ff:ff:ff:ff:ff:ff
26: tap_kdp1:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 500
link/ether 5e:15:22:b0:52:42 brd ff:ff:ff:ff:ff:ff

Ferruh Yigit (2):
  kdp: add kernel data path kernel module
  kdp: add virtual PMD for kernel slow data path communication

 config/common_linuxapp |   9 +-
 doc/guides/nics/pcap_ring.rst  | 125 -
 doc/guides/rel_notes/release_2_3.rst   |   6 +
 drivers/net/Makefile   |   3 +-
 drivers/net/kdp/Makefile   |  61 +++
 drivers/net/kdp/rte_eth_kdp.c  | 481 +
 drivers/net/kdp/rte_kdp.c  | 365 +
 drivers/net/kd

  1   2   >