date:20150127

[dpdk-dev] [PATCH] testpmd: check return value of rte_eth_dev_vlan_filter()

2015-01-27 Thread Thomas Monjalon

2015-01-27 15:58, Jastrzebski, MichalX K:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > And more importantly, you make it clear that sometimes we cannot enable
> > all vlans and return no error.
> 
> Should I return this error somewhere? Isn't just printing the error best 
> option here?

Yes printing a warning before "break" seems a good idea.

> > So I wonder how is it documented in the testpmd help?
> 
> I can add a note in testpmd_funcs.rst file or I can place some info in 
> .help_str?
> What do you mean "testpmd help"?

I mean both :)
But maybe it's not appropriate in .help_str, I'm not sure.

-- 
Thomas

[dpdk-dev] DPDK - scatter gather send to nic

2015-01-27 Thread Jog Lie

Hello,

Currently using raw sockets with pf_packet, i would like to be able to send 
scattered data to nic from different mmap files.
I use writev() right now, but what does dpdk suggest as an alternative?

Thanks

[dpdk-dev] [PATCH v2] librte_pmd_ixgbe: Add queue start failure check

2015-01-27 Thread Michael Qiu

For ixgbe, when queue start failure, for example, mbuf allocate
failure, the device will still start success, which could be
an issue.

Add return status check of queue start to avoid this issue.

Signed-off-by: Michael Qiu 
---
v2 --> v1
. remove duplicated error message in ixgbe_dev_rxtx_start()
. remove '\n' in PMD_INIT_LOG()

 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  6 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 20 +++-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b58ec45..ede8706 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1495,7 +1495,11 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
goto error;
}

-   ixgbe_dev_rxtx_start(dev);
+   err = ixgbe_dev_rxtx_start(dev);
+   if (err < 0) {
+   PMD_INIT_LOG(ERR, "Unable to start rxtx queues");
+   goto error;
+   }

if (ixgbe_is_sfp(hw) && hw->phy.multispeed_fiber) {
err = hw->mac.ops.setup_sfp(hw);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 677c257..1383194 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -265,7 +265,7 @@ int ixgbe_dev_rx_init(struct rte_eth_dev *dev);

 void ixgbe_dev_tx_init(struct rte_eth_dev *dev);

-void ixgbe_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbe_dev_rxtx_start(struct rte_eth_dev *dev);

 int ixgbe_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 840bc07..0224ed0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3806,7 +3806,7 @@ ixgbe_setup_loopback_link_82599(struct ixgbe_hw *hw)
 /*
  * Start Transmit and Receive Units.
  */
-void
+int
 ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)
 {
struct ixgbe_hw *hw;
@@ -3816,6 +3816,7 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)
uint32_t dmatxctl;
uint32_t rxctrl;
uint16_t i;
+   int ret = 0;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3838,14 +3839,22 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)

for (i = 0; i < dev->data->nb_tx_queues; i++) {
txq = dev->data->tx_queues[i];
-   if (!txq->tx_deferred_start)
-   ixgbe_dev_tx_queue_start(dev, i);
+   if (!txq->tx_deferred_start) {
+   ret = ixgbe_dev_tx_queue_start(dev, i);
+   if (ret < 0) {
+   return ret;
+   }
+   }
}

for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];
-   if (!rxq->rx_deferred_start)
-   ixgbe_dev_rx_queue_start(dev, i);
+   if (!rxq->rx_deferred_start) {
+   ret = ixgbe_dev_rx_queue_start(dev, i);
+   if (ret < 0) {
+   return ret;
+   }
+   }
}

/* Enable Receive engine */
@@ -3860,6 +3869,7 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)
dev->data->dev_conf.lpbk_mode == IXGBE_LPBK_82599_TX_RX)
ixgbe_setup_loopback_link_82599(hw);

+   return 0;
 }

 /*
-- 
1.9.3

[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem

2015-01-27 Thread Alexandre Frigon

Hi all, 

 I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a back-to-back 
setup both equipped with 82599EB 10-Gigabit NIC. 
The problem is when I start it, pktgen indicates 1Mbits/s Tx with 64B 
packet size,  but I'm receiving  about 15% of it on the other end. 
This percentage seems to be proportional with the packet size.

e.g. 
Using nload to read Rx traffic
Pktgen: Tx: 1Mbits/s==> Other end:  Rx 1660 Mbits/s
Rate: 100%  
Pkt size: 64B   


e.g 2
Pktgen: Tx: 1Mbits/s==> Other end:  Rx 9385 Mbits/s
Rate: 100%
Pkt size: 1518B


Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz
./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024 --file-prefix pg -- 
-p 0x3 -P  -N -m "[1:3].0, [2:4].1"

Is there something I'm not configuring correctly or something I have miss?

Also, the % rate is  acting strangely since anything above 50% doesn't change 
the Tx rate and anything below is modifying it
e.g Tx:  1Mbits/s   5000Mbits/s
%Rate:  >=50%   25%


Thanks
Alexandre F.

[dpdk-dev] [PATCH v2] testpmd check return value of rte_eth_dev_vlan_filter()

2015-01-27 Thread Michal Jastrzebski

This patch modifies testpmd behavior when setting:
rx_vlan add all vf_port (enabling all vlanids
to be passed thru rx filter on VF).
Rx_vlan_all_filter_set() function,
checks if the next vlanid can be enabled by the driver.
Number of vlanids is limited by the NIC and thus the NIC
do not allow to enable more vlanids than it can allocate
in VFTA table.

v2 - fix formatting errors

Signed-off-by: Michal Jastrzebski 
---
 app/test-pmd/config.c |   15 +--
 app/test-pmd/testpmd.h|2 +-
 lib/librte_ether/rte_ethdev.c |4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index c40f819..eda737e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1643,21 +1643,22 @@ rx_vlan_filter_set(portid_t port_id, int on)
   "diag=%d\n", port_id, on, diag);
 }

-void
+int
 rx_vft_set(portid_t port_id, uint16_t vlan_id, int on)
 {
int diag;

if (port_id_is_invalid(port_id))
-   return;
+   return 1;
if (vlan_id_is_invalid(vlan_id))
-   return;
+   return 1;
diag = rte_eth_dev_vlan_filter(port_id, vlan_id, on);
if (diag == 0)
-   return;
+   return 0;
printf("rte_eth_dev_vlan_filter(port_pi=%d, vlan_id=%d, on=%d) failed "
   "diag=%d\n",
   port_id, vlan_id, on, diag);
+   return -1;
 }

 void
@@ -1667,8 +1668,10 @@ rx_vlan_all_filter_set(portid_t port_id, int on)

if (port_id_is_invalid(port_id))
return;
-   for (vlan_id = 0; vlan_id < 4096; vlan_id++)
-   rx_vft_set(port_id, vlan_id, on);
+   for (vlan_id = 0; vlan_id < 4096; vlan_id++){
+   if (rx_vft_set(port_id, vlan_id, on))
+   break;
+   }
 }

 void
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8f5e6c7..e0186b9 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -492,7 +492,7 @@ void rx_vlan_strip_set_on_queue(portid_t port_id, uint16_t 
queue_id, int on);

 void rx_vlan_filter_set(portid_t port_id, int on);
 void rx_vlan_all_filter_set(portid_t port_id, int on);
-void rx_vft_set(portid_t port_id, uint16_t vlan_id, int on);
+int rx_vft_set(portid_t port_id, uint16_t vlan_id, int on);
 void vlan_extend_set(portid_t port_id, int on);
 void vlan_tpid_set(portid_t port_id, uint16_t tp_id);
 void tx_vlan_set(portid_t port_id, uint16_t vlan_id);
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ea3a1fb..064b5d6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1519,8 +1519,8 @@ rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t 
vlan_id, int on)
return (-EINVAL);
}
FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vlan_filter_set, -ENOTSUP);
-   (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
-   return (0);
+
+   return (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
 }

 int
-- 
1.7.9.5

[dpdk-dev] vhost user examples

2015-01-27 Thread Benoît Canet

The Tuesday 27 Jan 2015 ? 17:10:29 (+), Xie, Huawei wrote :
> 
> 
> > -Original Message-
> > From: Beno?t Canet [mailto:benoit.canet at irqsave.net]
> > Sent: Wednesday, January 28, 2015 1:00 AM
> > To: Xie, Huawei
> > Cc: Beno?t Canet; dev at dpdk.org
> > Subject: Re: vhost user examples
> > 
> > The Tuesday 27 Jan 2015 ? 16:33:19 (+), Xie, Huawei wrote :
> > >
> > > > -Original Message-
> > > > From: Beno?t Canet [mailto:benoit.canet at irqsave.net]
> > > > Sent: Tuesday, January 27, 2015 10:11 PM
> > > > To: Xie, Huawei; dev at dpdk.org
> > > > Subject: vhost user examples
> > > >
> > > >
> > > > Hi Xie,
> > > >
> > > > I would be interested in alpha testing the vhost user patchset.
> > > >
> > > > Is there an up to date example of how to use it ?
> > > >
> > >
> > > vHost has a parameter called base_name. Temporarily you could specify --
> > base-name as
> > > the unix domain socket path, and set it in the qemu command line as socket
> > path.
> > > recommended qemu command line for memory and vhost configuration:
> > > -m $MEM -object memory-backend-file,id=mem,size=${MEM}M,mem-
> > path=/mnt/huge,share=on -numa node,memdev=mem  -mem-prealloc
> > > -chardev socket,id=char0,path=$sock_path  -netdev type=vhost-
> > user,id=mynet1,chardev=char0,vhostforce \
> > > -device virtio-net-pci,mac=52:54:00:12:34:11,netdev=mynet1
> > 
> > Thanks,
> > 
> > Is the upstream vhost example in master already patched for this ?
> > Or does it still required some patches ?
> Yes, vhost example already supports base_name parameter.
> Previously it is used for name of the vhost character device file.
> You could use it as socket path name.

Thanks a lot

I am looking forward receiving my intel network card and testing
it.

Best regards

Beno?t
> 
> > 
> > Best regards
> > 
> > Beno?t
> > 
> > >
> > > > Best regards
> > > >
> > > > Beno?t

[dpdk-dev] vhost user examples

2015-01-27 Thread Xie, Huawei



> -Original Message-
> From: Beno?t Canet [mailto:benoit.canet at irqsave.net]
> Sent: Wednesday, January 28, 2015 1:00 AM
> To: Xie, Huawei
> Cc: Beno?t Canet; dev at dpdk.org
> Subject: Re: vhost user examples
> 
> The Tuesday 27 Jan 2015 ? 16:33:19 (+), Xie, Huawei wrote :
> >
> > > -Original Message-
> > > From: Beno?t Canet [mailto:benoit.canet at irqsave.net]
> > > Sent: Tuesday, January 27, 2015 10:11 PM
> > > To: Xie, Huawei; dev at dpdk.org
> > > Subject: vhost user examples
> > >
> > >
> > > Hi Xie,
> > >
> > > I would be interested in alpha testing the vhost user patchset.
> > >
> > > Is there an up to date example of how to use it ?
> > >
> >
> > vHost has a parameter called base_name. Temporarily you could specify --
> base-name as
> > the unix domain socket path, and set it in the qemu command line as socket
> path.
> > recommended qemu command line for memory and vhost configuration:
> > -m $MEM -object memory-backend-file,id=mem,size=${MEM}M,mem-
> path=/mnt/huge,share=on -numa node,memdev=mem  -mem-prealloc
> > -chardev socket,id=char0,path=$sock_path  -netdev type=vhost-
> user,id=mynet1,chardev=char0,vhostforce \
> > -device virtio-net-pci,mac=52:54:00:12:34:11,netdev=mynet1
> 
> Thanks,
> 
> Is the upstream vhost example in master already patched for this ?
> Or does it still required some patches ?
Yes, vhost example already supports base_name parameter.
Previously it is used for name of the vhost character device file.
You could use it as socket path name.

> 
> Best regards
> 
> Beno?t
> 
> >
> > > Best regards
> > >
> > > Beno?t

[dpdk-dev] [PATCH v3] test: fix missing NULL pointer checks

2015-01-27 Thread Daniel Mrzyglod

In test_sched, we are missing NULL pointer checks after create_mempool()
and rte_pktmbuf_alloc(). Add in these checks using TEST_ASSERT_NOT_NULL macros.

VERIFY macro was removed and replaced by standard test ASSERTS from "test.h" 
header.
This provides additional information to track when the failure occured.

v3 changes:
- remove VERIFY macro
- fix spelling error.
- change unproper comment

v2 changes:
- Replace all VERIFY macros instances by proper TEST_ASSERT* macros.
- fix description

v1 changes:
- first iteration of patch using VERIFY macro.

Signed-off-by: Daniel Mrzyglod 
---
 app/test/test_sched.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/app/test/test_sched.c b/app/test/test_sched.c
index c957d80..60c62de 100644
--- a/app/test/test_sched.c
+++ b/app/test/test_sched.c
@@ -46,13 +46,6 @@
 #include 


-#define VERIFY(exp,fmt,args...)\
-   if (!(exp)) {   \
-   printf(fmt, ##args);
\
-   return -1;  
\
-   }
-
-
 #define SUBPORT0
 #define PIPE   1
 #define TC 2
@@ -166,48 +159,49 @@ test_sched(void)
int err;

mp = create_mempool();
+   TEST_ASSERT_NOT_NULL(mp, "Error creating mempool\n");

port_param.socket = 0;
port_param.rate = (uint64_t) 1 * 1000 * 1000 / 8;

port = rte_sched_port_config(_param);
-   VERIFY(port != NULL, "Error config sched port\n");
-
+   TEST_ASSERT_NOT_NULL(port, "Error config sched port\n");

err = rte_sched_subport_config(port, SUBPORT, subport_param);
-   VERIFY(err == 0, "Error config sched, err=%d\n", err);
+   TEST_ASSERT_SUCCESS(err, "Error config sched, err=%d\n", err);

for (pipe = 0; pipe < port_param.n_pipes_per_subport; pipe ++) {
err = rte_sched_pipe_config(port, SUBPORT, pipe, 0);
-   VERIFY(err == 0, "Error config sched pipe %u, err=%d\n", pipe, 
err);
+   TEST_ASSERT_SUCCESS(err, "Error config sched pipe %u, 
err=%d\n", pipe, err);
}

for (i = 0; i < 10; i++) {
in_mbufs[i] = rte_pktmbuf_alloc(mp);
+   TEST_ASSERT_NOT_NULL(in_mbufs[i], "Packet allocation failed\n");
prepare_pkt(in_mbufs[i]);
}


err = rte_sched_port_enqueue(port, in_mbufs, 10);
-   VERIFY(err == 10, "Wrong enqueue, err=%d\n", err);
+   TEST_ASSERT_EQUAL(err, 10, "Wrong enqueue, err=%d\n", err);

err = rte_sched_port_dequeue(port, out_mbufs, 10);
-   VERIFY(err == 10, "Wrong dequeue, err=%d\n", err);
+   TEST_ASSERT_EQUAL(err, 10, "Wrong dequeue, err=%d\n", err);

for (i = 0; i < 10; i++) {
enum rte_meter_color color;
uint32_t subport, traffic_class, queue;

color = rte_sched_port_pkt_read_color(out_mbufs[i]);
-   VERIFY(color == e_RTE_METER_YELLOW, "Wrong color\n");
+   TEST_ASSERT_EQUAL(color, e_RTE_METER_YELLOW, "Wrong color\n");

rte_sched_port_pkt_read_tree_path(out_mbufs[i],
, , _class, );

-   VERIFY(subport == SUBPORT, "Wrong subport\n");
-   VERIFY(pipe == PIPE, "Wrong pipe\n");
-   VERIFY(traffic_class == TC, "Wrong traffic_class\n");
-   VERIFY(queue == QUEUE, "Wrong queue\n");
+   TEST_ASSERT_EQUAL(subport, SUBPORT, "Wrong subport\n");
+   TEST_ASSERT_EQUAL(pipe, PIPE, "Wrong pipe\n");
+   TEST_ASSERT_EQUAL(traffic_class, TC, "Wrong traffic_class\n");
+   TEST_ASSERT_EQUAL(queue, QUEUE, "Wrong queue\n");

}

@@ -215,12 +209,15 @@ test_sched(void)
struct rte_sched_subport_stats subport_stats;
uint32_t tc_ov;
rte_sched_subport_read_stats(port, SUBPORT, _stats, _ov);
-   //VERIFY(subport_stats.n_pkts_tc[TC-1] == 10, "Wrong subport stats\n");
-
+#if 0
+   TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong subport 
stats\n");
+#endif
struct rte_sched_queue_stats queue_stats;
uint16_t qlen;
rte_sched_queue_read_stats(port, QUEUE, _stats, );
-   //VERIFY(queue_stats.n_pkts == 10, "Wrong queue stats\n");
+#if 0
+   TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue stats\n");
+#endif

rte_sched_port_free(port);

-- 
2.1.0

[dpdk-dev] vhost user examples

2015-01-27 Thread Xie, Huawei


> -Original Message-
> From: Beno?t Canet [mailto:benoit.canet at irqsave.net]
> Sent: Tuesday, January 27, 2015 10:11 PM
> To: Xie, Huawei; dev at dpdk.org
> Subject: vhost user examples
> 
> 
> Hi Xie,
> 
> I would be interested in alpha testing the vhost user patchset.
> 
> Is there an up to date example of how to use it ?
> 

vHost has a parameter called base_name. Temporarily you could specify 
--base-name as
the unix domain socket path, and set it in the qemu command line as socket path.
recommended qemu command line for memory and vhost configuration:
-m $MEM -object 
memory-backend-file,id=mem,size=${MEM}M,mem-path=/mnt/huge,share=on -numa 
node,memdev=mem  -mem-prealloc
-chardev socket,id=char0,path=$sock_path  -netdev 
type=vhost-user,id=mynet1,chardev=char0,vhostforce \
-device virtio-net-pci,mac=52:54:00:12:34:11,netdev=mynet1

> Best regards
> 
> Beno?t

[dpdk-dev] [PATCH v4 00/11] Port Hotplug Framework

2015-01-27 Thread Tetsuya Mukawa

On 2015/01/27 14:50, Qiu, Michael wrote:
> On 1/27/2015 1:02 PM, Tetsuya Mukawa wrote:
>> On 2015/01/27 12:00, Qiu, Michael wrote:
>>> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
 This patch series adds a dynamic port hotplug framework to DPDK.
 With the patches, DPDK apps can attach or detach ports at runtime.

 The basic concept of the port hotplug is like followings.
 - DPDK apps must have responsibility to manage ports.
   DPDK apps only know which ports are attached or detached at the moment.
   The port hotplug framework is implemented to allow DPDK apps to manage 
 ports.
   For example, when DPDK apps call port attach function, attached port 
 number
   will be returned. Also DPDK apps can detach port by port number.
 - Kernel support is needed for attaching or detaching physical device 
 ports.
   To attach new device, the device will be recognized by kernel at first 
 and
   controlled by kernel driver. Then user can bind the device to igb_uio
>>> Here does it really need native kernel driver here? As it will be
>>> controlled by igb_uio.
>>> I think even if the device has no kernel driver is also OK.
>> Thanks for correcting. Yes, it should be.
>> How about following.
>>
>> - Kernel support is needed for attaching or detaching physical device ports.
>>   To attach a new device, the device will be recognized by kernel PCI
>> hotplug feature at first.
> No, here should not explain as "kernel PCI hotplug feature" which is
> stand for removing or adding a PCI device from system level, it is
> devices related not driver.
>
> What about:
>
> - Kernel support is needed for attaching or detaching physical device
> ports. To attach a new physical device port, the device will be
> recognized by userspace directly I/O framework in kernel at first.

Thanks. I will replace like above.

Thanks,
Tetsuya

> Thanks,
> Michael
>>   Then user can bind the device to igb_uio.
>>
>>> Also I have finished initial patch of passthrough driver flag in
>>> "struct rte_pci_device"
>>>
>>> I will send to you after I do some basic test on that, then I will send to
>>> you, and you can give some comments on that.
>> I appreciate for your implementing.
>>
>> Thanks,
>> Tetsuya
>>
>>
>>> Thanks,
>>> Michael
>>>
   by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
   functions to attach ports.
   For detaching, steps are vice versa.
 - Before detach ports, ports must be stopped and closed.
   DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() 
 before
   detaching ports. These function will call finalization codes of PMDs.
   But so far, no PMD frees all resources allocated by initialization.
   It means PMDs are needed to be fixed to support the port hotplug.
   'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports 
 detaching.
   Without this flag, detaching will be failed.
 - Mustn't affect legacy DPDK apps.
   No DPDK EAL behavior is changed, if the port hotplug functions are't 
 called.
   So all legacy DPDK apps can still work without modifications.

 And a few limitations.
 - The port hotplug functions are not thread safe.
   DPDK apps should handle it.
 - Only support Linux and igb_uio so far.
   BSD and VFIO is not supported. I will send VFIO patches at least, but I 
 don't
   have a plan to submit BSD patch so far.


 Here is port hotplug APIs.
 ---
 /**
  * Attach a new device.
  *
  * @param devargs
  *   A pointer to a strings array describing the new device
  *   to be attached. The strings should be a pci address like
  *   ':01:00.0' or virtual device name like 'eth_pcap0'.
  * @param port_id
  *  A pointer to a port identifier actually attached.
  * @return
  *  0 on success and port_id is filled, negative on error
  */
 int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

 /**
  * Detach a device.
  *
  * @param port_id
  *   The port identifier of the device to detach.
  * @param addr
  *  A pointer to a device name actually detached.
  * @return
  *  0 on success and devname is filled, negative on error
  */
 int rte_eal_dev_detach(uint8_t port_id, char *devname);
 ---

 This patch series are for DPDK EAL. To use port hotplug function by DPDK 
 apps,
 each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please 
 check
 a patch for pcap PMD.

 Also please check testpmd patch. It will show you how to fix your legacy
 applications to support port hotplug feature.


 PATCH v4 changes
  - Merge patches to review easier.
  - Fix indent of 'if' statement.

[dpdk-dev] DPDK testpmd forwarding performace degradation

2015-01-27 Thread De Lara Guarch, Pablo



> On Tue, Jan 27, 2015 at 10:51 AM, Alexander Belyakov

> mailto:abelyako at gmail.com>> wrote:

>

> Hi Pablo,

>

> On Mon, Jan 26, 2015 at 5:22 PM, De Lara Guarch, Pablo

> mailto:pablo.de.lara.guarch at intel.com>> 
> wrote:

> Hi Alexander,

>

> > -Original Message-

> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexander

> Belyakov

> > Sent: Monday, January 26, 2015 10:18 AM

> > To: dev at dpdk.org

> > Subject: [dpdk-dev] DPDK testpmd forwarding performace degradation

> >

> > Hello,

> >

> > recently I have found a case of significant performance degradation for our

> > application (built on top of DPDK, of course). Surprisingly, similar issue

> > is easily reproduced with default testpmd.

> >

> > To show the case we need simple IPv4 UDP flood with variable UDP

> payload

> > size. Saying "packet length" below I mean: Eth header length (14 bytes) +

> > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP payload

> > length (variable) + CRC (4 bytes). Source IP addresses and ports are

> selected

> > randomly for each packet.

> >

> > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same

> issue.

> >

> > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build and

> > run testpmd. Enable testpmd forwarding ("start" command).

> >

> > Table below shows measured forwarding performance depending on

> packet

> > length:

> >

> > No. -- UDP payload length (bytes) -- Packet length (bytes) -- Forwarding

> > performance (Mpps) -- Expected theoretical performance (Mpps)

> >

> > 1. 0 -- 64 -- 14.8 -- 14.88

> > 2. 34 -- 80 -- 12.4 -- 12.5

> > 3. 35 -- 81 -- 6.2 -- 12.38 (!)

> > 4. 40 -- 86 -- 6.6 -- 11.79

> > 5. 49 -- 95 -- 7.6 -- 10.87

> > 6. 50 -- 96 -- 10.7 -- 10.78 (!)

> > 7. 60 -- 106 -- 9.4 -- 9.92

> >

> > At line number 3 we have added 1 byte of UDP payload (comparing to

> > previous

> > line) and got forwarding performance halved! 6.2 Mpps against 12.38 Mpps

> > of

> > expected theoretical maximum for this packet size.

> >

> > That is the issue.

> >

> > Significant performance degradation exists up to 50 bytes of UDP payload

> > (96 bytes packet length), where it jumps back to theoretical maximum.

> >

> > What is happening between 80 and 96 bytes packet length?

> >

> > This issue is stable and 100% reproducible. At this point I am not sure if

> > it is DPDK or NIC issue. These tests have been performed on Intel(R) Eth

> > Svr Bypass Adapter X520-LR2 (X520LR2BP).

> >

> > Is anyone aware of such strange behavior?

> I cannot reproduce the issue using two ports on two different 82599EB NICs,

> using 1.7.1 and 1.8.0.

> I always get either same or better linerate as I increase the packet size.

>

> Thank you for trying to reproduce the issue.

>

> Actually, have you tried using 1.8.0?

>

> I feel 1.8.0 is little bit immature and might require some post-release

> patching. Even tespmd from this release is not forwarding packets properly

> on my setup. It is up and running without visible errors/warnings, TX/RX

> counters are ticking but I can not see any packets at the output.



This is strange. Without  changing anything, forwarding works perfectly for me

(so, RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is enabled).



>Please note, both 1.6.0r2 and 1.7.1 releases work (on the same setup) 
>out-of-the-box just

> fine with only exception of this mysterious performance drop.

> So it will take some time to figure out what is wrong with dpdk-1.8.0.

> Meanwhile we could focus on stable dpdk-1.7.1.

>

> Managed to get testpmd from dpdk-1.8.0 to work on my setup.

> Unfortunately I had to disable RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC,

> it is new comparing to 1.7.1 and somehow breaks testpmd forwarding. By the

> way, simply disabling RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC in

> common_linuxapp config file breaks the build - had to make quick'n'dirty fix

> in struct igb_rx_queue as well.

>

> Anyway, issue is still here.

>

> Forwarding 80 bytes packets at 12.4 Mpps.

> Forwarding 81 bytes packets at 7.2 Mpps.

>

> Any ideas?

> As for X520-LR2 NIC - it is dual port bypass adapter with device id 155d. I

> believe it should be treated as 82599EB except bypass feature. I put bypass

> mode to "normal" in those tests.



I have used a 82599EB first, and now a X520-SR2. Same results.

I assume that X520-SR2 and X520-LR2 should give similar results

(only thing that is changed is the wavelength, but the controller is the same).



Pablo

> Alexander

>

>

> Pablo

> >

> > Regards,

> > Alexander Belyakov

>

>

[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

2015-01-27 Thread Xie, Huawei



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, January 27, 2015 5:59 PM
> To: Xie, Huawei
> Cc: Ouyang, Changchun; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers
> 
> 
> > I recall our original code is virtio_wmb().
> > Use store fence to ensure all updates to entries before updating the index.
> > Why do we need virtio_rmb() here and add virtio_wmb after
> vq_update_avail_idx()?
> 
> Store fence is unnecessary, Intel CPU's are cache coherent, please read
> the virtio Linux ring header file for explanation. A full fence WMB
> is more expensive and causes CPU stall
> 


I mean virtio_wmb rather than virtio_rmb should be used here, 
and both of them are defined as compiler barrier.

The following code is linux virtio driver for adding buffer to vring.
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
avail = (vq->vring.avail->idx & (vq->vring.num-1));
vq->vring.avail->ring[avail] = head;

/* Descriptors and available array need to be set before we expose the
 * new available array entries. */
virtio_wmb(vq->weak_barriers);
vq->vring.avail->idx++;

> > >   vq->vq_ring.avail->idx = vq->vq_avail_idx;
> > >  }
> > >
> > > @@ -255,7 +264,7 @@ static inline void
> > >  virtqueue_notify(struct virtqueue *vq)
> > >  {
> > >   /*
> > > -  * Ensure updated avail->idx is visible to host. mb() necessary?
> > > +  * Ensure updated avail->idx is visible to host.
> > >* For virtio on IA, the notificaiton is through io port operation
> > >* which is a serialization instruction itself.
> > >*/
> > > --
> > > 1.8.4.2
> >

[dpdk-dev] [PATCH] testpmd: check return value of rte_eth_dev_vlan_filter()

2015-01-27 Thread Jastrzebski, MichalX K

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, January 27, 2015 11:33 AM
> To: Jastrzebski, MichalX K
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] testpmd: check return value of
> rte_eth_dev_vlan_filter()
> 
> Hi Michal,
> 
> 2015-01-23 11:43, Michal Jastrzebski:
> > This patch modifies testpmd behavior when setting:
> > rx_vlan add all vf_port (enabling all vlanids
> > to be passed thru rx filter on VF).
> > Rx_vlan_all_filter_set() function,
> > checks if the next vlanid can be enabled by the driver.
> > Number of vlanids is limited by the NIC and thus the NIC
> > do not allow to enable more vlanids than it can allocate
> > in VFTA table.
> >
> > Signed-off by: Michal Jastrzebski 
> 
> checkpatch is not happy because you forgot an hyphen.
> 
> > @@ -1667,8 +1668,9 @@ rx_vlan_all_filter_set(portid_t port_id, int on)
> >
> > if (port_id_is_invalid(port_id))
> > return;
> > -   for (vlan_id = 0; vlan_id < 4096; vlan_id++)
> > -   rx_vft_set(port_id, vlan_id, on);
> > +   for (vlan_id = 0; vlan_id < 4096; vlan_id++){
> > +   if ( rx_vft_set(port_id, vlan_id, on) ) break;
> 
> Again, checkpatch does not like this line.
> 
Hi Thomas,
Thanks for pointed it out. I have already fixed all checkpatch.pl errors.
I will send v2 patch for this,
> And more importantly, you make it clear that sometimes we cannot enable
> all
> vlans and return no error.
Should I return this error somewhere? Isn't just printing the error best option 
here?
> So I wonder how is it documented in the testpmd help?
I can add a note in testpmd_funcs.rst file or I can place some info in 
.help_str?
What do you mean "testpmd help"?
> 
> Thanks
> --
> Thomas

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-01-27 Thread Linhaifeng

Hi,all

I use vhost-user to send data to VM at first it cant work well but after many 
hours VM can not receive data but can send data.

(gdb)p avail_idx
$4 = 2668
(gdb)p free_entries
$5 = 0
(gdb)l
/* check that we have enough buffers */
if (unlikely(count > free_entries))
count = free_entries;

if (count == 0){
int b=0;
if(b) { // when set b=1 to notify guest rx_ring will restart to work
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {

eventfd_write(vq->callfd, 1);
}
}
return 0;
}

some info i print in guest:

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646

# free
 total   used   free sharedbuffers cached
Mem:  3924100  3372523586848  0  95984 138060
-/+ buffers/cache: 1032083820892
Swap:   970748  0 970748

I have two questions:
1.Should we need to notify guest when there is no buffer in vq->avail?
2.Why virtio_net stop to fill avail?






-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] test: fix missing NULL pointer checks

2015-01-27 Thread Thomas Monjalon

2015-01-27 15:17, Daniel Mrzyglod:
> In test_sched, we are missing NULL pointer checks after calls to create the
> mempool and to allocate an mbuf. Add in these checks using 
> TEST_ASSERT_NOT_NULL macros.
> 
> Signed-off-by: Daniel Mrzyglod 
[...]
> + TEST_ASSERT_NOT_NULL(mp, "Error create mempool\n");

Asked previously (http://dpdk.org/ml/archives/dev/2014-December/010392.html),
it seems to be a typo: create -> creating
Sorry to insist but I'm not an english native and it seems weird to me.

-- 
Thomas

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-27 Thread Ananyev, Konstantin

Hi Olivier,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, January 27, 2015 8:34 AM
> To: Ananyev, Konstantin; Liu, Jijiang; Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum 
> forwarding engine
> 
> Hi Konstantin,
> 
> On 01/26/2015 03:15 PM, Ananyev, Konstantin wrote:
>  Another thing - IPIP seems to work ok by HW.
>  There is something wrong on our (PMD/test-pmd) side.
>  I think at least we have to remove the following check:
>  if (!l2_len) {
>   PMD_DRV_LOG(DEBUG, "L2 length set to 0");
>   return;
>   }
>  in i40e_txd_enable_checksum().
> >>>
> >>> Yes, for IPIP, the check should be removed.
> >>
> >> Yes, I think these lines should be removed for 2 reasons:
> >> - it may be the cause of ipip tunnel not working
> >> - we shouldn't do these kind of tests in dataplane. I think we have to
> >>suppose that the data passed to the PMD is valid.
> >>
> >> I'll redo the test with ipip tomorrow with this fix and let you
> >> know the result. If it works, I'll add this in the next version
> >> of the patch.
> >
> > While you are on this, can I suggest you'll add debug logging for TCD and 
> > TDD we are writing to the TX ring?
> > Something like that:
> >
> > +   PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]:\n"
> > +   "tunneling_params: %#x;\n"
> > +   "l2tag2: %#hx;\n"
> > +   "rsvd: %#hx;\n"
> > +   "type_cmd_tso_mss: %#lx;\n",
> > +   tx_pkt, tx_id,
> > +   ctx_txd->tunneling_params,
> > +   ctx_txd->l2tag2,
> > +   ctx_txd->rsvd,
> > +   ctx_txd->type_cmd_tso_mss);
> >
> > And same for TDD.
> > It  helped me a lot to figure out what is going on, when I did my tests.
> > Probably would be useful for other people too.
> 
> Sure, I'll add this.
> 
> Also, just to let you know that I tested the ipip case without the
> "if (l2_len) return" and "if (l3_len) return", and it is working.

That's great.
Thanks
Konstantin

> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v2] test: fix missing NULL pointer checks

2015-01-27 Thread Thomas Monjalon

2015-01-27 15:17, Daniel Mrzyglod:
> In test_sched, we are missing NULL pointer checks after calls to create the
> mempool and to allocate an mbuf. Add in these checks using 
> TEST_ASSERT_NOT_NULL macros.

You are adding checks *and* replacing VERIFY by TEST_ASSERT_NOT_NULL.
It would be better to explain why you do these replacements.

Please don't forget to answer the "why" question in the commit logs.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2] test: fix missing NULL pointer checks

2015-01-27 Thread Daniel Mrzyglod

In test_sched, we are missing NULL pointer checks after calls to create the
mempool and to allocate an mbuf. Add in these checks using TEST_ASSERT_NOT_NULL 
macros.

Signed-off-by: Daniel Mrzyglod 
---
 app/test/test_sched.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/app/test/test_sched.c b/app/test/test_sched.c
index c957d80..83dccd2 100644
--- a/app/test/test_sched.c
+++ b/app/test/test_sched.c
@@ -166,48 +166,49 @@ test_sched(void)
int err;

mp = create_mempool();
+   TEST_ASSERT_NOT_NULL(mp, "Error create mempool\n");

port_param.socket = 0;
port_param.rate = (uint64_t) 1 * 1000 * 1000 / 8;

port = rte_sched_port_config(_param);
-   VERIFY(port != NULL, "Error config sched port\n");
-
+   TEST_ASSERT_NOT_NULL(port, "Error config sched port\n");

err = rte_sched_subport_config(port, SUBPORT, subport_param);
-   VERIFY(err == 0, "Error config sched, err=%d\n", err);
+   TEST_ASSERT_SUCCESS(err, "Error config sched, err=%d\n", err);

for (pipe = 0; pipe < port_param.n_pipes_per_subport; pipe ++) {
err = rte_sched_pipe_config(port, SUBPORT, pipe, 0);
-   VERIFY(err == 0, "Error config sched pipe %u, err=%d\n", pipe, 
err);
+   TEST_ASSERT_SUCCESS(err, "Error config sched pipe %u, 
err=%d\n", pipe, err);
}

for (i = 0; i < 10; i++) {
in_mbufs[i] = rte_pktmbuf_alloc(mp);
+   TEST_ASSERT_NOT_NULL(in_mbufs[i], "Packet allocation failed\n");
prepare_pkt(in_mbufs[i]);
}


err = rte_sched_port_enqueue(port, in_mbufs, 10);
-   VERIFY(err == 10, "Wrong enqueue, err=%d\n", err);
+   TEST_ASSERT_EQUAL(err, 10, "Wrong enqueue, err=%d\n", err);

err = rte_sched_port_dequeue(port, out_mbufs, 10);
-   VERIFY(err == 10, "Wrong dequeue, err=%d\n", err);
+   TEST_ASSERT_EQUAL(err, 10, "Wrong dequeue, err=%d\n", err);

for (i = 0; i < 10; i++) {
enum rte_meter_color color;
uint32_t subport, traffic_class, queue;

color = rte_sched_port_pkt_read_color(out_mbufs[i]);
-   VERIFY(color == e_RTE_METER_YELLOW, "Wrong color\n");
+   TEST_ASSERT_EQUAL(color, e_RTE_METER_YELLOW, "Wrong color\n");

rte_sched_port_pkt_read_tree_path(out_mbufs[i],
, , _class, );

-   VERIFY(subport == SUBPORT, "Wrong subport\n");
-   VERIFY(pipe == PIPE, "Wrong pipe\n");
-   VERIFY(traffic_class == TC, "Wrong traffic_class\n");
-   VERIFY(queue == QUEUE, "Wrong queue\n");
+   TEST_ASSERT_EQUAL(subport, SUBPORT, "Wrong subport\n");
+   TEST_ASSERT_EQUAL(pipe, PIPE, "Wrong pipe\n");
+   TEST_ASSERT_EQUAL(traffic_class, TC, "Wrong traffic_class\n");
+   TEST_ASSERT_EQUAL(queue, QUEUE, "Wrong queue\n");

}

-- 
2.1.0

[dpdk-dev] [PATCH] librte_pmd_ixgbe: Add queue start failure check

2015-01-27 Thread Thomas Monjalon

2015-01-27 12:00, Qiu, Michael:
> On 1/27/2015 6:02 PM, Thomas Monjalon wrote:
> > Hi Michael,
> >
> > I'm clearly not the maintainer of ixgbe, so I'd prefer someone else
> > reviewing this patch. However I have few comments.
> 
> Thanks Thomas,
> 
> I will send v2 with your comments.
> 
> But who maintains ixgbe? I would like add him(or she) to the cc list.

I don't know clearly. I hope we'll collect some maintainers name in next days.
I was able to do the review for this simple patch.

-- 
Thomas

[dpdk-dev] [PATCH v2] librte_pmd_ixgbe: Add queue start failure check

2015-01-27 Thread Thomas Monjalon

2015-01-27 20:16, Michael Qiu:
> For ixgbe, when queue start failure, for example, mbuf allocate
> failure, the device will still start success, which could be
> an issue.
> 
> Add return status check of queue start to avoid this issue.
> 
> Signed-off-by: Michael Qiu 
> ---
> v2 --> v1
>   . remove duplicated error message in ixgbe_dev_rxtx_start()
>   . remove '\n' in PMD_INIT_LOG()

So the braces are not needed anymore (reported by checkpatch):
> + if (ret < 0) {
> + return ret;
> + }

Acked-by: Thomas Monjalon 
Applied with removed braces

Thanks
-- 
Thomas

[dpdk-dev] vhost user examples

2015-01-27 Thread Benoît Canet


Hi Xie,

I would be interested in alpha testing the vhost user patchset.

Is there an up to date example of how to use it ?

Best regards

Beno?t

[dpdk-dev] [snabb-devel] RE: [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Luke Gorrie

Hi again John,

Thank you for the patient answers :-)

Thank you for pointing this out: I was mistakenly testing your Sandy Bridge
code on Haswell (lacking -DRTE_MACHINE_CPUFLAG_AVX2).

Correcting that, your code is both the fastest and the smallest in my
humble micro benchmarking tests.

Looks like you have done great work! You probably knew that already :-) but
thank you for walking me through it.

The code compiles to 745 bytes of object code (smaller than glibc 2.20
memcpy) and cachebenches like this:

Memory Copy Library Cache Test

C Size  Nanosec MB/sec  % Chnge
--- --- --- ---
256 0.0197587.601.00
384 0.0197628.831.00
512 0.0197613.951.00
768 0.01147811.44   0.66
10240.01158938.68   0.93
15360.01168487.49   0.94
20480.01174278.83   0.97
30720.01156922.58   1.11
40960.01145811.59   1.08
61440.01157388.27   0.93
81920.01149616.95   1.05
12288   0.01149064.26   1.00
16384   0.01107895.06   1.38

the key difference from my perspective is that glibc 2.20 memcpy
performance goes way down for >= 2048 bytes when they switch from vector
moves to string moves, while your code stays consistent.

I will take it for a spin in a real application.

Cheers,
-Luke

[dpdk-dev] [PATCH v4 00/11] Port Hotplug Framework

2015-01-27 Thread Tetsuya Mukawa

On 2015/01/27 12:00, Qiu, Michael wrote:
> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
>> This patch series adds a dynamic port hotplug framework to DPDK.
>> With the patches, DPDK apps can attach or detach ports at runtime.
>>
>> The basic concept of the port hotplug is like followings.
>> - DPDK apps must have responsibility to manage ports.
>>   DPDK apps only know which ports are attached or detached at the moment.
>>   The port hotplug framework is implemented to allow DPDK apps to manage 
>> ports.
>>   For example, when DPDK apps call port attach function, attached port number
>>   will be returned. Also DPDK apps can detach port by port number.
>> - Kernel support is needed for attaching or detaching physical device ports.
>>   To attach new device, the device will be recognized by kernel at first and
>>   controlled by kernel driver. Then user can bind the device to igb_uio
> Here does it really need native kernel driver here? As it will be
> controlled by igb_uio.
> I think even if the device has no kernel driver is also OK.

Thanks for correcting. Yes, it should be.
How about following.

- Kernel support is needed for attaching or detaching physical device ports.
  To attach a new device, the device will be recognized by kernel PCI
hotplug feature at first.
  Then user can bind the device to igb_uio.

>
> Also I have finished initial patch of passthrough driver flag in
> "struct rte_pci_device"
>
> I will send to you after I do some basic test on that, then I will send to
> you, and you can give some comments on that.

I appreciate for your implementing.

Thanks,
Tetsuya


>
> Thanks,
> Michael
>
>>   by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
>>   functions to attach ports.
>>   For detaching, steps are vice versa.
>> - Before detach ports, ports must be stopped and closed.
>>   DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() 
>> before
>>   detaching ports. These function will call finalization codes of PMDs.
>>   But so far, no PMD frees all resources allocated by initialization.
>>   It means PMDs are needed to be fixed to support the port hotplug.
>>   'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
>>   Without this flag, detaching will be failed.
>> - Mustn't affect legacy DPDK apps.
>>   No DPDK EAL behavior is changed, if the port hotplug functions are't 
>> called.
>>   So all legacy DPDK apps can still work without modifications.
>>
>> And a few limitations.
>> - The port hotplug functions are not thread safe.
>>   DPDK apps should handle it.
>> - Only support Linux and igb_uio so far.
>>   BSD and VFIO is not supported. I will send VFIO patches at least, but I 
>> don't
>>   have a plan to submit BSD patch so far.
>>
>>
>> Here is port hotplug APIs.
>> ---
>> /**
>>  * Attach a new device.
>>  *
>>  * @param devargs
>>  *   A pointer to a strings array describing the new device
>>  *   to be attached. The strings should be a pci address like
>>  *   ':01:00.0' or virtual device name like 'eth_pcap0'.
>>  * @param port_id
>>  *  A pointer to a port identifier actually attached.
>>  * @return
>>  *  0 on success and port_id is filled, negative on error
>>  */
>> int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);
>>
>> /**
>>  * Detach a device.
>>  *
>>  * @param port_id
>>  *   The port identifier of the device to detach.
>>  * @param addr
>>  *  A pointer to a device name actually detached.
>>  * @return
>>  *  0 on success and devname is filled, negative on error
>>  */
>> int rte_eal_dev_detach(uint8_t port_id, char *devname);
>> ---
>>
>> This patch series are for DPDK EAL. To use port hotplug function by DPDK 
>> apps,
>> each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please 
>> check
>> a patch for pcap PMD.
>>
>> Also please check testpmd patch. It will show you how to fix your legacy
>> applications to support port hotplug feature.
>>
>>
>> PATCH v4 changes
>>  - Merge patches to review easier.
>>  - Fix indent of 'if' statement.
>>  - Fix calculation method of eal_compare_pci_addr().
>>  - Fix header file declaration.
>>  - Add header file to determine if hotplug can be enabled.
>>(Thanks to Qiu, Michael)
>>  - Use braces with 'for' loop.
>>  - Add paramerter checking.
>>  - Fix sanity check code
>>  - Fix comments of rte_eth_dev_type.
>>  - Change function names.
>>(Thanks to Iremonger, Bernard)
>>
>> PATCH v3 changes:
>>  - Fix enum definition used in rte_ethdev.c.
>>(Thanks to Zhang, Helin)
>>
>> PATCH v2 changes:
>>  - Replace rte_eal_dev_attach_pdev(), rte_eal_dev_detach_pdev,
>>rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev() to
>>rte_eal_dev_attach() and rte_eal_dev_detach().
>>  - Add parameter values checking.
>>  - Refashion a few functions.
>>(Thanks to Iremonger, Bernard)
>>

[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files

2015-01-27 Thread Pawel Wodkowski

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 
 lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 
 2 files changed, 16 insertions(+)

diff --git a/lib/librte_pmd_bond/rte_eth_bond_8023ad.h 
b/lib/librte_pmd_bond/rte_eth_bond_8023ad.h
index 9adc6aa..ebd0e93 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_8023ad.h
+++ b/lib/librte_pmd_bond/rte_eth_bond_8023ad.h
@@ -36,6 +36,10 @@

 #include 

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Actor/partner states
  */
@@ -211,4 +215,8 @@ int
 rte_eth_bond_8023ad_slave_info(uint8_t port_id, uint8_t slave_id,
struct rte_eth_bond_8023ad_slave_info *conf);

+#ifdef __cplusplus
+}
+#endif
+
 #endif /* RTE_ETH_BOND_8023AD_H_ */
diff --git a/lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h 
b/lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h
index 8adee70..7930345 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h
+++ b/lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h
@@ -42,6 +42,10 @@

 #include "rte_eth_bond_8023ad.h"

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define BOND_MODE_8023AX_UPDATE_TIMEOUT_MS  100
 /** Maximum number of packets to one slave queued in TX ring. */
 #define BOND_MODE_8023AX_SLAVE_RX_PKTS3
@@ -305,4 +309,8 @@ bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *dev, 
uint8_t slave_pos);
 void
 bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev);

+#ifdef __cplusplus
+}
+#endif
+
 #endif /* RTE_ETH_BOND_8023AD_H_ */
-- 
1.7.9.5

[dpdk-dev] DPDK testpmd forwarding performace degradation

2015-01-27 Thread Alexander Belyakov

On Tue, Jan 27, 2015 at 10:51 AM, Alexander Belyakov 
wrote:

>
> Hi Pablo,
>
> On Mon, Jan 26, 2015 at 5:22 PM, De Lara Guarch, Pablo <
> pablo.de.lara.guarch at intel.com> wrote:
>
>> Hi Alexander,
>>
>> > -Original Message-
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexander Belyakov
>> > Sent: Monday, January 26, 2015 10:18 AM
>> > To: dev at dpdk.org
>> > Subject: [dpdk-dev] DPDK testpmd forwarding performace degradation
>> >
>> > Hello,
>> >
>> > recently I have found a case of significant performance degradation for
>> our
>> > application (built on top of DPDK, of course). Surprisingly, similar
>> issue
>> > is easily reproduced with default testpmd.
>> >
>> > To show the case we need simple IPv4 UDP flood with variable UDP payload
>> > size. Saying "packet length" below I mean: Eth header length (14 bytes)
>> +
>> > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP
>> payload
>> > length (variable) + CRC (4 bytes). Source IP addresses and ports are
>> selected
>> > randomly for each packet.
>> >
>> > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same
>> issue.
>> >
>> > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build
>> and
>> > run testpmd. Enable testpmd forwarding ("start" command).
>> >
>> > Table below shows measured forwarding performance depending on packet
>> > length:
>> >
>> > No. -- UDP payload length (bytes) -- Packet length (bytes) -- Forwarding
>> > performance (Mpps) -- Expected theoretical performance (Mpps)
>> >
>> > 1. 0 -- 64 -- 14.8 -- 14.88
>> > 2. 34 -- 80 -- 12.4 -- 12.5
>> > 3. 35 -- 81 -- 6.2 -- 12.38 (!)
>> > 4. 40 -- 86 -- 6.6 -- 11.79
>> > 5. 49 -- 95 -- 7.6 -- 10.87
>> > 6. 50 -- 96 -- 10.7 -- 10.78 (!)
>> > 7. 60 -- 106 -- 9.4 -- 9.92
>> >
>> > At line number 3 we have added 1 byte of UDP payload (comparing to
>> > previous
>> > line) and got forwarding performance halved! 6.2 Mpps against 12.38 Mpps
>> > of
>> > expected theoretical maximum for this packet size.
>> >
>> > That is the issue.
>> >
>> > Significant performance degradation exists up to 50 bytes of UDP payload
>> > (96 bytes packet length), where it jumps back to theoretical maximum.
>> >
>> > What is happening between 80 and 96 bytes packet length?
>> >
>> > This issue is stable and 100% reproducible. At this point I am not sure
>> if
>> > it is DPDK or NIC issue. These tests have been performed on Intel(R) Eth
>> > Svr Bypass Adapter X520-LR2 (X520LR2BP).
>> >
>> > Is anyone aware of such strange behavior?
>>
>> I cannot reproduce the issue using two ports on two different 82599EB
>> NICs, using 1.7.1 and 1.8.0.
>> I always get either same or better linerate as I increase the packet size.
>>
>
> Thank you for trying to reproduce the issue.
>
>
>> Actually, have you tried using 1.8.0?
>>
>
> I feel 1.8.0 is little bit immature and might require some post-release
> patching. Even tespmd from this release is not forwarding packets properly
> on my setup. It is up and running without visible errors/warnings, TX/RX
> counters are ticking but I can not see any packets at the output. Please
> note, both 1.6.0r2 and 1.7.1 releases work (on the same setup)
> out-of-the-box just fine with only exception of this mysterious performance
> drop.
>
> So it will take some time to figure out what is wrong with dpdk-1.8.0.
> Meanwhile we could focus on stable dpdk-1.7.1.
>
>
Managed to get testpmd from dpdk-1.8.0 to work on my setup. Unfortunately I
had to disable RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC, it is new comparing to
1.7.1 and somehow breaks testpmd forwarding. By the way, simply disabling
RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC in common_linuxapp config file breaks
the build - had to make quick'n'dirty fix in struct igb_rx_queue as well.

Anyway, issue is still here.

Forwarding 80 bytes packets at 12.4 Mpps.
Forwarding 81 bytes packets at 7.2 Mpps.

Any ideas?

As for X520-LR2 NIC - it is dual port bypass adapter with device id 155d. I
> believe it should be treated as 82599EB except bypass feature. I put bypass
> mode to "normal" in those tests.
>
> Alexander
>
>
>>
>> Pablo
>> >
>> > Regards,
>> > Alexander Belyakov
>>
>
>
>

[dpdk-dev] [PATCH] ixgbe: initialize link status on initialization

2015-01-27 Thread Thomas Monjalon

> The link_status variable is not set when device is initialized.
> This can lead to problems with link never being reported as up
> if using some SFP modules where the link is instantly on.
> 
> Signed-off-by: Stephen Hemminger 

Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v3] test: fix missing NULL pointer checks

2015-01-27 Thread Neil Horman

On Tue, Jan 27, 2015 at 04:44:53PM +0100, Daniel Mrzyglod wrote:
> In test_sched, we are missing NULL pointer checks after create_mempool()
> and rte_pktmbuf_alloc(). Add in these checks using TEST_ASSERT_NOT_NULL 
> macros.
> 
> VERIFY macro was removed and replaced by standard test ASSERTS from "test.h" 
> header.
> This provides additional information to track when the failure occured.
> 
> v3 changes:
> - remove VERIFY macro
> - fix spelling error.
> - change unproper comment
> 
> v2 changes:
> - Replace all VERIFY macros instances by proper TEST_ASSERT* macros.
> - fix description
> 
> v1 changes:
> - first iteration of patch using VERIFY macro.
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  app/test/test_sched.c | 39 ++-
>  1 file changed, 18 insertions(+), 21 deletions(-)
> 
> diff --git a/app/test/test_sched.c b/app/test/test_sched.c
> index c957d80..60c62de 100644
> --- a/app/test/test_sched.c
> +++ b/app/test/test_sched.c
> @@ -46,13 +46,6 @@
>  #include 
>  
>  
> -#define VERIFY(exp,fmt,args...)  \
> - if (!(exp)) {   \
> - printf(fmt, ##args);
> \
> - return -1;  
> \
> - }
> -
> -
>  #define SUBPORT  0
>  #define PIPE 1
>  #define TC   2
> @@ -166,48 +159,49 @@ test_sched(void)
>   int err;
>  
>   mp = create_mempool();
> + TEST_ASSERT_NOT_NULL(mp, "Error creating mempool\n");
>  
>   port_param.socket = 0;
>   port_param.rate = (uint64_t) 1 * 1000 * 1000 / 8;
>  
>   port = rte_sched_port_config(_param);
> - VERIFY(port != NULL, "Error config sched port\n");
> -
> + TEST_ASSERT_NOT_NULL(port, "Error config sched port\n");
>  
>   err = rte_sched_subport_config(port, SUBPORT, subport_param);
> - VERIFY(err == 0, "Error config sched, err=%d\n", err);
> + TEST_ASSERT_SUCCESS(err, "Error config sched, err=%d\n", err);
>  
>   for (pipe = 0; pipe < port_param.n_pipes_per_subport; pipe ++) {
>   err = rte_sched_pipe_config(port, SUBPORT, pipe, 0);
> - VERIFY(err == 0, "Error config sched pipe %u, err=%d\n", pipe, 
> err);
> + TEST_ASSERT_SUCCESS(err, "Error config sched pipe %u, 
> err=%d\n", pipe, err);
>   }
>  
>   for (i = 0; i < 10; i++) {
>   in_mbufs[i] = rte_pktmbuf_alloc(mp);
> + TEST_ASSERT_NOT_NULL(in_mbufs[i], "Packet allocation failed\n");
>   prepare_pkt(in_mbufs[i]);
>   }
>  
>  
>   err = rte_sched_port_enqueue(port, in_mbufs, 10);
> - VERIFY(err == 10, "Wrong enqueue, err=%d\n", err);
> + TEST_ASSERT_EQUAL(err, 10, "Wrong enqueue, err=%d\n", err);
>  
>   err = rte_sched_port_dequeue(port, out_mbufs, 10);
> - VERIFY(err == 10, "Wrong dequeue, err=%d\n", err);
> + TEST_ASSERT_EQUAL(err, 10, "Wrong dequeue, err=%d\n", err);
>  
>   for (i = 0; i < 10; i++) {
>   enum rte_meter_color color;
>   uint32_t subport, traffic_class, queue;
>  
>   color = rte_sched_port_pkt_read_color(out_mbufs[i]);
> - VERIFY(color == e_RTE_METER_YELLOW, "Wrong color\n");
> + TEST_ASSERT_EQUAL(color, e_RTE_METER_YELLOW, "Wrong color\n");
>  
>   rte_sched_port_pkt_read_tree_path(out_mbufs[i],
>   , , _class, );
>  
> - VERIFY(subport == SUBPORT, "Wrong subport\n");
> - VERIFY(pipe == PIPE, "Wrong pipe\n");
> - VERIFY(traffic_class == TC, "Wrong traffic_class\n");
> - VERIFY(queue == QUEUE, "Wrong queue\n");
> + TEST_ASSERT_EQUAL(subport, SUBPORT, "Wrong subport\n");
> + TEST_ASSERT_EQUAL(pipe, PIPE, "Wrong pipe\n");
> + TEST_ASSERT_EQUAL(traffic_class, TC, "Wrong traffic_class\n");
> + TEST_ASSERT_EQUAL(queue, QUEUE, "Wrong queue\n");
>  
>   }
>  
> @@ -215,12 +209,15 @@ test_sched(void)
>   struct rte_sched_subport_stats subport_stats;
>   uint32_t tc_ov;
>   rte_sched_subport_read_stats(port, SUBPORT, _stats, _ov);
> - //VERIFY(subport_stats.n_pkts_tc[TC-1] == 10, "Wrong subport stats\n");
> -
> +#if 0
> + TEST_ASSERT_EQUAL(subport_stats.n_pkts_tc[TC-1], 10, "Wrong subport 
> stats\n");
> +#endif
>   struct rte_sched_queue_stats queue_stats;
>   uint16_t qlen;
>   rte_sched_queue_read_stats(port, QUEUE, _stats, );
> - //VERIFY(queue_stats.n_pkts == 10, "Wrong queue stats\n");
> +#if 0
> + TEST_ASSERT_EQUAL(queue_stats.n_pkts, 10, "Wrong queue stats\n");
> +#endif
>  
>   rte_sched_port_free(port);
>  
> -- 
> 2.1.0
> 
> 
These TEST_ASSERT macros are no better than the VERIFY macro, they contain
exaxtly the same return issue that I outlined in my first post on the subject.
Neil

[dpdk-dev] [PATCH] stats: remove useless memset's

2015-01-27 Thread Thomas Monjalon

2015-01-21 14:08, David Marchand:
> Hello Stephen,
> 
> > From: Stephen Hemminger 
> >
> > The rte_eth_stats_get is the only API that should call the device
> > statistics function directly, and it already does a memset of the
> > resulting structure. Therefore doing memset() in the driver is
> > redundant and should be removed.
> >
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  lib/librte_pmd_af_packet/rte_eth_af_packet.c | 2 --
> >  lib/librte_pmd_bond/rte_eth_bond_pmd.c   | 4 
> >  lib/librte_pmd_enic/enic_main.c  | 1 -
> >  lib/librte_pmd_i40e/i40e_ethdev_vf.c | 1 -
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c  | 1 -
> >  lib/librte_pmd_ring/rte_eth_ring.c   | 1 -
> >  6 files changed, 10 deletions(-)
> >
> 
> I think you missed some :
> - lib/librte_pmd_e1000/igb_ethdev.c function eth_igbvf_stats_get()
> - lib/librte_pmd_pcap/rte_eth_pcap.c function eth_stats_get()
> 
> With these fixed :
> Acked-By: David Marchand 

Applied with above fixes.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] mk: allow application to override clean

2015-01-27 Thread Thomas Monjalon

> In some cases application may want to have additional rules
> for clean. This can be handled by allowing the double colon
> form of rule.
> 
>  https://www.gnu.org/software/make/manual/html_node/Double_002dColon.html
> 
> Single colon and double colon rules for same target causes
> an error.
> 
> Signed-off-by: Stephen Hemminger 

I think this need could also be solved by having a pkgconfig-like file
and not using this rte.app.mk.

Acked-by: Thomas Monjalon 

Applied, despite wrong formatting of the patch

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] Added missing extern 'C' decls in rte_ip_frag.h

2015-01-27 Thread Thomas Monjalon

> Signed-off-by: Marc Sune 

> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

Fixes: 601e279df074 ("move fragmentation/reassembly headers into a library")
Acked-by: Thomas Monjalon 

Applied

It seems that the same kind of fix is needed for
lib/librte_pmd_bond/rte_eth_bond_8023ad.h

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] power: added missing extern keyword in rte_power.h

2015-01-27 Thread Thomas Monjalon

> rte_power_freq_min function did not include "extern" keyword,
> causing linking errors.
> 
> Signed-off-by: Pablo de Lara 
> Reported-by: Ildar Mustafin 

Fixes: 445c6528b55f ("power: common interface for guest and host")
Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] pcap: Fix ethernet device's name for pcap port

2015-01-27 Thread Thomas Monjalon

> Ethernet device's data should contain the virtual device name for pcap port.
> This name is correctly set by rte_eth_dev_allocate() at initialization time,
> but it is directly lost.
> 
> Signed-off-by: Remi Pommarel 

Fixes: 83b41136934d ("ethdev: add unique name to devices")
Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] eal/common: Fix enabled core number with core list argument

2015-01-27 Thread Thomas Monjalon

> When using core list argument to define which core to enable (ie -l) the
> core_num field of the rte configuration is not updated the same way as using
> coremask. This causes rte_lcore_num() to yield different value from the one
> using coremask.
> 
> Signed-off-by: Remi Pommarel 

Good catch, it was forgotten when adding this option.

Fixes: d888cb8b9613 ("add core list input format")
Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Ananyev, Konstantin



> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, January 27, 2015 11:30 AM
> To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> 
> > -Original Message-
> > From: Wang, Zhihong
> > Sent: Tuesday, January 27, 2015 1:42 AM
> > To: Ananyev, Konstantin; Richardson, Bruce; Marc Sune
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> >
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, January 27, 2015 2:29 AM
> > > To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> > > Cc: dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > >
> > > Hi Zhihong,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> > > > Sent: Friday, January 23, 2015 6:52 AM
> > > > To: Richardson, Bruce; Marc Sune
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > > Richardson
> > > > > Sent: Wednesday, January 21, 2015 9:26 PM
> > > > > To: Marc Sune
> > > > > Cc: dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > >
> > > > > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> > > > > >
> > > > > > On 21/01/15 14:02, Bruce Richardson wrote:
> > > > > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> > > > > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> > > > > > -Original Message-
> > > > > > From: Richardson, Bruce
> > > > > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > > > > To: Neil Horman
> > > > > > Cc: Wang, Zhihong; dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > > > 
> > > > > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > > > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > > > > >>>-Original Message-
> > > > > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > >>>Sent: Monday, January 19, 2015 9:02 PM
> > > > > > >>>To: Wang, Zhihong
> > > > > > >>>Cc: dev at dpdk.org
> > > > > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> > > > > > >>>optimization
> > > > > > >>>
> > > > > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
> > > > > > >>>zhihong.wang at intel.com
> > > > > > wrote:
> > > > > > This patch set optimizes memcpy for DPDK for both SSE and
> > > > > > AVX
> > > > > > platforms.
> > > > > > It also extends memcpy test coverage with unaligned cases
> > > > > > and more test
> > > > > > >>>points.
> > > > > > Optimization techniques are summarized below:
> > > > > > 
> > > > > > 1. Utilize full cache bandwidth
> > > > > > 
> > > > > > 2. Enforce aligned stores
> > > > > > 
> > > > > > 3. Apply load address alignment based on architecture
> > > > > > features
> > > > > > 
> > > > > > 4. Make load/store address available as early as possible
> > > > > > 
> > > > > > 5. General optimization techniques like inlining, branch
> > > > > > reducing, prefetch pattern access
> > > > > > 
> > > > > > Zhihong Wang (4):
> > > > > >    Disabled VTA for memcpy test in app/test/Makefile
> > > > > >    Removed unnecessary test cases in test_memcpy.c
> > > > > >    Extended test coverage in test_memcpy_perf.c
> > > > > >    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
> > > > > and AVX
> > > > > >  platforms
> > > > > > 
> > > > > >   app/test/Makefile  |   6 +
> > > > > >   app/test/test_memcpy.c |  52 
> > > > > >  +-
> > > > > >   app/test/test_memcpy_perf.c| 238 
> > > > > >  +---
> > > > > >   .../common/include/arch/x86/rte_memcpy.h   | 664
> > > > > > >>>+++--
> > > > > >   4 files changed, 656 insertions(+), 304 deletions(-)
> > > > > > 
> > > > > > --
> > > > > > 1.9.3
> > > > > > 
> > > > > > 
> > > > > > >>>Are you able to compile this with gcc 4.9.2?  The
> > > > > > >>>compilation of test_memcpy_perf is taking forever for me.  It
> > > appears hung.
> > > > > > >>>Neil
> > > > > > >>Neil,
> > > > > > >>
> > > > > > >>Thanks for reporting this!
> > > > > > >>It should compile but will take quite some time if the CPU
> > > > > > >>doesn't support
> > > > > > AVX2, the reason is that:
> > > > > > >>1.

[dpdk-dev] [PATCH] mk: add support for ICC 15 compiler

2015-01-27 Thread Thomas Monjalon

> > This patch add Support for ICC 15.
> > 
> > ICC 15 changed inline-max-size and inline-max-total-size default values, so
> > for ICC 15 flags -no-inline-max-size -no-inline-max-total-size must be 
> > added.
> > 
> > additionally disable compile error for:
> > 13368 - loop was not vectorized with "vector always assert"
> > 15527 - loop was not vectorized: function call to fprintf cannot be 
> > vectorize
> > 
> > Signed-off-by: Daniel Mrzyglod 
> 
> Acked-by: Sergio Gonzalez Monroy 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] librte_pmd_ixgbe: Add queue start failure check

2015-01-27 Thread Qiu, Michael

On 1/27/2015 6:02 PM, Thomas Monjalon wrote:
> Hi Michael,
>
> I'm clearly not the maintainer of ixgbe, so I'd prefer someone else
> reviewing this patch. However I have few comments.

Thanks Thomas,

I will send v2 with your comments.

But who maintains ixgbe? I would like add him(or she) to the cc list.

> 2015-01-15 22:45, Michael Qiu:
>> -ixgbe_dev_rxtx_start(dev);
>> +err = ixgbe_dev_rxtx_start(dev);
>> +if (err < 0) {
>> +PMD_INIT_LOG(ERR, "Unable to start rxtx queues\n");
> \n is not needed in PMD_INIT_LOG.
>
> Is this useful to print a log here, given that there already has
> some logs in ixgbe_dev_rxtx_start?

You are right, what I'm opinion is to show more details about the error,
but seems duplicated.

I will remove it.

Thanks,
Michael
>
>> +PMD_INIT_LOG(ERR, "Start tx queue failed\n");
> [...]
>> +PMD_INIT_LOG(ERR, "Start rx queue failed\n");
> Please remove \n.
>
> Except these minor comments, it looks good.
> Thanks

[dpdk-dev] [PATCH] ixgbe: do not include CRC in Tx byte count

2015-01-27 Thread Stephen Hemminger

On Tue, 27 Jan 2015 11:11:39 +0100
Thomas Monjalon  wrote:

> Hi Stephen,
> 
> 2015-01-22 22:23, stephen at networkplumber.org:
> > From: Stephen Hemminger 
> > 
> > The ixgbe driver was including CRC in the transmit packet byte
> > count, but not for packets received. This was notice when forwarding and
> > the number of bytes received was greater than the number of bytes 
> > transmitted
> 
> Tx includes CRC and Rx count is greater, really?
> 
> > for the same number of packets. Make the driver behave like other
> > virtual devices and not include CRC in byte count. Use the same queue
> > counters already computed and used for Rx.
> 
> Please could you describe the difference between gptc/gotc and qptc/qbtc?
> 
> Thank you

The byte counts for global registers include CRC in the byte count.
This has been observed by experimentation and validated by QA and documented
in Intel HW specs.

The original code used queue counts for Rx because the global counters
include missed packets (which must not be included in the ipacket counts).
I suspect that is why the original developer used the queue counts, as
a good side effect the Rx byte count was correct (no CRC). I just extended
this for Tx.

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-01-27 Thread Michael S. Tsirkin

On Tue, Jan 27, 2015 at 03:57:13PM +0800, Linhaifeng wrote:
> Hi,all
> 
> I use vhost-user to send data to VM at first it cant work well but after many 
> hours VM can not receive data but can send data.
> 
> (gdb)p avail_idx
> $4 = 2668
> (gdb)p free_entries
> $5 = 0
> (gdb)l
> /* check that we have enough buffers */
> if (unlikely(count > free_entries))
> count = free_entries;
> 
> if (count == 0){
> int b=0;
> if(b) { // when set b=1 to notify guest rx_ring will restart to 
> work
> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> 
> eventfd_write(vq->callfd, 1);
> }
> }
> return 0;
> }
> 
> some info i print in guest:
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646
> 
> # free
>  total   used   free sharedbuffers cached
> Mem:  3924100  3372523586848  0  95984 138060
> -/+ buffers/cache: 1032083820892
> Swap:   970748  0 970748
> 
> I have two questions:
> 1.Should we need to notify guest when there is no buffer in vq->avail?

No unless NOTIFY_ON_EMPTY is set (most guests don't set it).

> 2.Why virtio_net stop to fill avail?

Most likely, it didn't get an interrupt.

If so, it would be a dpdk vhost user bug.
Which code are you using in dpdk?

> 
> 
> 
> 
> 
> -- 
> Regards,
> Haifeng

[dpdk-dev] [PATCH] testpmd: check return value of rte_eth_dev_vlan_filter()

2015-01-27 Thread Thomas Monjalon

Hi Michal,

2015-01-23 11:43, Michal Jastrzebski:
> This patch modifies testpmd behavior when setting:
> rx_vlan add all vf_port (enabling all vlanids
> to be passed thru rx filter on VF).
> Rx_vlan_all_filter_set() function,
> checks if the next vlanid can be enabled by the driver.
> Number of vlanids is limited by the NIC and thus the NIC
> do not allow to enable more vlanids than it can allocate
> in VFTA table.
> 
> Signed-off by: Michal Jastrzebski 

checkpatch is not happy because you forgot an hyphen.

> @@ -1667,8 +1668,9 @@ rx_vlan_all_filter_set(portid_t port_id, int on)
>  
>   if (port_id_is_invalid(port_id))
>   return;
> - for (vlan_id = 0; vlan_id < 4096; vlan_id++)
> - rx_vft_set(port_id, vlan_id, on);
> + for (vlan_id = 0; vlan_id < 4096; vlan_id++){
> + if ( rx_vft_set(port_id, vlan_id, on) ) break;

Again, checkpatch does not like this line.

And more importantly, you make it clear that sometimes we cannot enable all
vlans and return no error.
So I wonder how is it documented in the testpmd help?

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Ananyev, Konstantin



> -Original Message-
> From: Wang, Zhihong
> Sent: Tuesday, January 27, 2015 1:42 AM
> To: Ananyev, Konstantin; Richardson, Bruce; Marc Sune
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, January 27, 2015 2:29 AM
> > To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> > Hi Zhihong,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> > > Sent: Friday, January 23, 2015 6:52 AM
> > > To: Richardson, Bruce; Marc Sune
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > Richardson
> > > > Sent: Wednesday, January 21, 2015 9:26 PM
> > > > To: Marc Sune
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > >
> > > > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> > > > >
> > > > > On 21/01/15 14:02, Bruce Richardson wrote:
> > > > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> > > > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> > > > > -Original Message-
> > > > > From: Richardson, Bruce
> > > > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > > > To: Neil Horman
> > > > > Cc: Wang, Zhihong; dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > > 
> > > > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > > > >>>-Original Message-
> > > > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > >>>Sent: Monday, January 19, 2015 9:02 PM
> > > > > >>>To: Wang, Zhihong
> > > > > >>>Cc: dev at dpdk.org
> > > > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> > > > > >>>optimization
> > > > > >>>
> > > > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
> > > > > >>>zhihong.wang at intel.com
> > > > > wrote:
> > > > > This patch set optimizes memcpy for DPDK for both SSE and
> > > > > AVX
> > > > > platforms.
> > > > > It also extends memcpy test coverage with unaligned cases
> > > > > and more test
> > > > > >>>points.
> > > > > Optimization techniques are summarized below:
> > > > > 
> > > > > 1. Utilize full cache bandwidth
> > > > > 
> > > > > 2. Enforce aligned stores
> > > > > 
> > > > > 3. Apply load address alignment based on architecture
> > > > > features
> > > > > 
> > > > > 4. Make load/store address available as early as possible
> > > > > 
> > > > > 5. General optimization techniques like inlining, branch
> > > > > reducing, prefetch pattern access
> > > > > 
> > > > > Zhihong Wang (4):
> > > > >    Disabled VTA for memcpy test in app/test/Makefile
> > > > >    Removed unnecessary test cases in test_memcpy.c
> > > > >    Extended test coverage in test_memcpy_perf.c
> > > > >    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
> > > > and AVX
> > > > >  platforms
> > > > > 
> > > > >   app/test/Makefile  |   6 +
> > > > >   app/test/test_memcpy.c |  52 +-
> > > > >   app/test/test_memcpy_perf.c| 238 
> > > > >  +---
> > > > >   .../common/include/arch/x86/rte_memcpy.h   | 664
> > > > > >>>+++--
> > > > >   4 files changed, 656 insertions(+), 304 deletions(-)
> > > > > 
> > > > > --
> > > > > 1.9.3
> > > > > 
> > > > > 
> > > > > >>>Are you able to compile this with gcc 4.9.2?  The
> > > > > >>>compilation of test_memcpy_perf is taking forever for me.  It
> > appears hung.
> > > > > >>>Neil
> > > > > >>Neil,
> > > > > >>
> > > > > >>Thanks for reporting this!
> > > > > >>It should compile but will take quite some time if the CPU
> > > > > >>doesn't support
> > > > > AVX2, the reason is that:
> > > > > >>1. The SSE & AVX memcpy implementation is more complicated
> > > > than
> > > > > AVX2
> > > > > >>version thus the compiler takes more time to compile and
> > > > > >>optimize
> > > > 2.
> > > > > >>The new test_memcpy_perf.c contains 126 constants memcpy
> > > > > >>calls for better test case coverage, that's quite a lot
> > > > > >>
> > > > > >>I've just tested this patch on an Ivy Bridge machine with GCC
> > 4.9.2:
> > > > > >>1. The whole compile

[dpdk-dev] [PATCH] ixgbe: do not include CRC in Tx byte count

2015-01-27 Thread Thomas Monjalon

Hi Stephen,

2015-01-22 22:23, stephen at networkplumber.org:
> From: Stephen Hemminger 
> 
> The ixgbe driver was including CRC in the transmit packet byte
> count, but not for packets received. This was notice when forwarding and
> the number of bytes received was greater than the number of bytes transmitted

Tx includes CRC and Rx count is greater, really?

> for the same number of packets. Make the driver behave like other
> virtual devices and not include CRC in byte count. Use the same queue
> counters already computed and used for Rx.

Please could you describe the difference between gptc/gotc and qptc/qbtc?

Thank you
-- 
Thomas

[dpdk-dev] DPDK testpmd forwarding performace degradation

2015-01-27 Thread Alexander Belyakov

Hello,

On Tue, Jan 27, 2015 at 5:49 AM, ???  wrote:

> 65 bytes frame may degrade performace a lot.Thats related to DMA and cache.
> When NIC dma packets to memory, NIC has to do read modify write if DMA
> size is partial cache line.So for 65 bytes, the first 64 bytes are ok. The
> next 1 byte NIC has to read the whole cache line, change one byte and
> update the cache line.
> So in DPDK, CRC is not stripped and ethernet header aligned to cache line
> which causes ip header not aligned on 4 bytes.
>
>
Extra cache line update indeed makes sense because performance is halved
with extra byte.

It is a little bit confusing, but the issue is not with switching from 64
bytes frames to 65 bytes frames, but with switching from 80 bytes frame to
81 bytes frame. Note that the issue disappears at 96 bytes frame size.

Alexander

[dpdk-dev] [PATCH] librte_pmd_ixgbe: Add queue start failure check

2015-01-27 Thread Thomas Monjalon

Hi Michael,

I'm clearly not the maintainer of ixgbe, so I'd prefer someone else
reviewing this patch. However I have few comments.

2015-01-15 22:45, Michael Qiu:
> - ixgbe_dev_rxtx_start(dev);
> + err = ixgbe_dev_rxtx_start(dev);
> + if (err < 0) {
> + PMD_INIT_LOG(ERR, "Unable to start rxtx queues\n");

\n is not needed in PMD_INIT_LOG.

Is this useful to print a log here, given that there already has
some logs in ixgbe_dev_rxtx_start?

> + PMD_INIT_LOG(ERR, "Start tx queue failed\n");
[...]
> + PMD_INIT_LOG(ERR, "Start rx queue failed\n");

Please remove \n.

Except these minor comments, it looks good.
Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Matthew Hall

On Tue, Jan 27, 2015 at 10:02:24AM +, Stephen Hemminger wrote:
> On Mon, 26 Jan 2015 19:06:12 -0800
> Matthew Hall  wrote:
> 
> > Thank you so much for this, using virtio drivers in DPDK has been messy and 
> > unpleasant in the past, and you clearly wrote a lot of nice new code to 
> > help 
> > improve it all.
> > 
> > Previously I'd reported a bug, where all RTE virtio drivers I tried (A and 
> > B, 
> > because I did not know C existed), failed to work with the virtio-net 
> > interfaces exposed in VirtualBox, due to various strange errors, and they 
> > all 
> > only worked with the virtio-net interfaces from qemu.
> 
> I suspect a problem with features required (and not supported by VirtualBox).
> Build driver with debug enabled and send the log please.

Hi Stephen,

Here is everything that happened when I tried it before.

http://dpdk.org/ml/archives/dev/2014-October/006623.html

Matthew.

[dpdk-dev] Fwd: DPDK testpmd forwarding performace degradation

2015-01-27 Thread Alexander Belyakov

Hello,

On Mon, Jan 26, 2015 at 8:08 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 26 Jan 2015 13:17:48 +0300
> Alexander Belyakov  wrote:
>
> > Hello,
> >
> > recently I have found a case of significant performance degradation for
> our
> > application (built on top of DPDK, of course). Surprisingly, similar
> issue
> > is easily reproduced with default testpmd.
> >
> > To show the case we need simple IPv4 UDP flood with variable UDP payload
> > size. Saying "packet length" below I mean: Eth header length (14 bytes) +
> > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP payload
> > length (variable) + CRC (4 bytes). Source IP addresses and ports are
> selected
> > randomly for each packet.
> >
> > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same
> issue.
> >
> > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build
> and
> > run testpmd. Enable testpmd forwarding ("start" command).
> >
> > Table below shows measured forwarding performance depending on packet
> > length:
> >
> > No. -- UDP payload length (bytes) -- Packet length (bytes) -- Forwarding
> > performance (Mpps) -- Expected theoretical performance (Mpps)
>
> Did you try using git bisect to identify the problem.
>

I believe dpdk-1.6.0r2 is the first release with bypass adapter (device id
155d) support and it already has the issue. So it seems I have no "good"
point.

Alexander

[dpdk-dev] Fwd: DPDK testpmd forwarding performace degradation

2015-01-27 Thread Alexander Belyakov

Hi Pablo,

On Mon, Jan 26, 2015 at 5:22 PM, De Lara Guarch, Pablo <
pablo.de.lara.guarch at intel.com> wrote:

> Hi Alexander,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexander Belyakov
> > Sent: Monday, January 26, 2015 10:18 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] DPDK testpmd forwarding performace degradation
> >
> > Hello,
> >
> > recently I have found a case of significant performance degradation for
> our
> > application (built on top of DPDK, of course). Surprisingly, similar
> issue
> > is easily reproduced with default testpmd.
> >
> > To show the case we need simple IPv4 UDP flood with variable UDP payload
> > size. Saying "packet length" below I mean: Eth header length (14 bytes) +
> > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP payload
> > length (variable) + CRC (4 bytes). Source IP addresses and ports are
> selected
> > randomly for each packet.
> >
> > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same
> issue.
> >
> > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build
> and
> > run testpmd. Enable testpmd forwarding ("start" command).
> >
> > Table below shows measured forwarding performance depending on packet
> > length:
> >
> > No. -- UDP payload length (bytes) -- Packet length (bytes) -- Forwarding
> > performance (Mpps) -- Expected theoretical performance (Mpps)
> >
> > 1. 0 -- 64 -- 14.8 -- 14.88
> > 2. 34 -- 80 -- 12.4 -- 12.5
> > 3. 35 -- 81 -- 6.2 -- 12.38 (!)
> > 4. 40 -- 86 -- 6.6 -- 11.79
> > 5. 49 -- 95 -- 7.6 -- 10.87
> > 6. 50 -- 96 -- 10.7 -- 10.78 (!)
> > 7. 60 -- 106 -- 9.4 -- 9.92
> >
> > At line number 3 we have added 1 byte of UDP payload (comparing to
> > previous
> > line) and got forwarding performance halved! 6.2 Mpps against 12.38 Mpps
> > of
> > expected theoretical maximum for this packet size.
> >
> > That is the issue.
> >
> > Significant performance degradation exists up to 50 bytes of UDP payload
> > (96 bytes packet length), where it jumps back to theoretical maximum.
> >
> > What is happening between 80 and 96 bytes packet length?
> >
> > This issue is stable and 100% reproducible. At this point I am not sure
> if
> > it is DPDK or NIC issue. These tests have been performed on Intel(R) Eth
> > Svr Bypass Adapter X520-LR2 (X520LR2BP).
> >
> > Is anyone aware of such strange behavior?
>
> I cannot reproduce the issue using two ports on two different 82599EB
> NICs, using 1.7.1 and 1.8.0.
> I always get either same or better linerate as I increase the packet size.
>

Thank you for trying to reproduce the issue.


> Actually, have you tried using 1.8.0?
>

I feel 1.8.0 is little bit immature and might require some post-release
patching. Even tespmd from this release is not forwarding packets properly
on my setup. It is up and running without visible errors/warnings, TX/RX
counters are ticking but I can not see any packets at the output. Please
note, both 1.6.0r2 and 1.7.1 releases work (on the same setup)
out-of-the-box just fine with only exception of this mysterious performance
drop.

So it will take some time to figure out what is wrong with dpdk-1.8.0.
Meanwhile we could focus on stable dpdk-1.7.1.

As for X520-LR2 NIC - it is dual port bypass adapter with device id 155d. I
believe it should be treated as 82599EB except bypass feature. I put bypass
mode to "normal" in those tests.

Alexander


>
> Pablo
> >
> > Regards,
> > Alexander Belyakov
>

[dpdk-dev] [PATCH v2 24/24] virtio: Remove hotspots

2015-01-27 Thread Ouyang Changchun

Remove those hotspots which is unnecessary when early returning occurs;
Also reverse one likely to unlikely to let compiler has better decision;

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index c6d9ae7..c4731b5 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -476,13 +476,13 @@ uint16_t
 virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
-   struct virtio_hw *hw = rxvq->hw;
+   struct virtio_hw *hw;
struct rte_mbuf *rxm, *new_mbuf;
-   uint16_t nb_used, num, nb_rx = 0;
+   uint16_t nb_used, num, nb_rx;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
int error;
-   uint32_t i, nb_enqueued = 0;
+   uint32_t i, nb_enqueued;
const uint32_t hdr_size = sizeof(struct virtio_net_hdr);

nb_used = VIRTQUEUE_NUSED(rxvq);
@@ -491,7 +491,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)

num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : 
VIRTIO_MBUF_BURST_SZ);
-   if (likely(num > DESC_PER_CACHELINE))
+   if (unlikely(num > DESC_PER_CACHELINE))
num = num - ((rxvq->vq_used_cons_idx + num) % 
DESC_PER_CACHELINE);

if (num == 0)
@@ -499,6 +499,11 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
+
+   hw = rxvq->hw;
+   nb_rx = 0;
+   nb_enqueued = 0;
+
for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];

@@ -568,17 +573,17 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
-   struct virtio_hw *hw = rxvq->hw;
+   struct virtio_hw *hw;
struct rte_mbuf *rxm, *new_mbuf;
-   uint16_t nb_used, num, nb_rx = 0;
+   uint16_t nb_used, num, nb_rx;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *prev;
int error;
-   uint32_t i = 0, nb_enqueued = 0;
-   uint32_t seg_num = 0;
-   uint16_t extra_idx = 0;
-   uint32_t seg_res = 0;
+   uint32_t i, nb_enqueued;
+   uint32_t seg_num;
+   uint16_t extra_idx;
+   uint32_t seg_res;
const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);

nb_used = VIRTQUEUE_NUSED(rxvq);
@@ -590,6 +595,14 @@ virtio_recv_mergeable_pkts(void *rx_queue,

PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);

+   hw = rxvq->hw;
+   nb_rx = 0;
+   i = 0;
+   nb_enqueued = 0;
+   seg_num = 0;
+   extra_idx = 0;
+   seg_res = 0;
+
while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;

-- 
1.8.4.2

[dpdk-dev] [PATCH v2 23/24] virtio: Fix zero copy break issue

2015-01-27 Thread Ouyang Changchun

vHOST zero copy need get vring descriptor and its buffer address to
set the DMA address of HW ring, it is done in new_device when ioctl set_backend
is called. This requies virtio_dev_rxtx_start is called before 
vtpci_reinit_complete,
which makes sure the vring descriptro and its buffer is ready before its using.

this patch also fixes one set status issue, according to virtio spec,
VIRTIO_CONFIG_STATUS_ACK should be set after virtio hw reset.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b905532..648c761 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -414,6 +414,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
/* reset the NIC */
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
+   hw->started = 0;
virtio_dev_free_mbufs(dev);
 }

@@ -1107,9 +1108,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
return -ENOMEM;
}

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
pci_dev = eth_dev->pci_dev;
if (virtio_resource_init(pci_dev) < 0)
 #ifdef RTE_EAL_PORT_IO
@@ -1123,6 +1121,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
/* Reset the device although not necessary at startup */
vtpci_reset(hw);

+   /* Tell the host we've noticed this device. */
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
virtio_negotiate_features(hw);
@@ -1324,10 +1325,10 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   vtpci_reinit_complete(hw);
-
/* Do final configuration before rx/tx engine starts */
virtio_dev_rxtx_start(dev);
+   vtpci_reinit_complete(hw);
+
hw->started = 1;

/*Notify the backend
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 22/24] virtio: Use soft vlan strip in mergeable Rx path

2015-01-27 Thread Ouyang Changchun

To keep the consistent logic with normal Rx path, the mergeable
Rx path also needs software vlan strip/decap if it is enabled.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index a82e8eb..c6d9ae7 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -568,6 +568,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
+   struct virtio_hw *hw = rxvq->hw;
struct rte_mbuf *rxm, *new_mbuf;
uint16_t nb_used, num, nb_rx = 0;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
@@ -674,6 +675,9 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_res -= rcv_cnt;
}

+   if (hw->vlan_strip)
+   rte_vlan_strip(rx_pkts[nb_rx]);
+
VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
rx_pkts[nb_rx]->data_len);

-- 
1.8.4.2

[dpdk-dev] [PATCH v2 21/24] example/vhost: Add vlan-strip cmd line option

2015-01-27 Thread Ouyang Changchun

Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VALN strip functionality.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 1d31520..4ff916d 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -159,6 +159,9 @@ static uint32_t num_devices;
 static uint32_t zero_copy;
 static int mergeable;

+/* Do vlan strip on host, enabled on default */
+static uint32_t vlan_strip = 1;
+
 /* number of descriptors to apply*/
 static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
 static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
@@ -564,6 +567,7 @@ us_vhost_usage(const char *prgname)
"   --rx-retry-delay [0-N]: timeout(in usecond) between 
retries on RX. This makes effect only if retries on rx enabled\n"
"   --rx-retry-num [0-N]: the number of retries on rx. This 
makes effect only if retries on rx enabled\n"
"   --mergeable [0|1]: disable(default)/enable RX mergeable 
buffers\n"
+   "   --vlan-strip [0|1]: disable/enable(default) RX VLAN 
strip on host\n"
"   --stats [0-N]: 0: Disable stats, N: Time in seconds to 
print stats\n"
"   --dev-basename: The basename to be used for the 
character device.\n"
"   --zero-copy [0|1]: disable(default)/enable rx/tx "
@@ -591,6 +595,7 @@ us_vhost_parse_args(int argc, char **argv)
{"rx-retry-delay", required_argument, NULL, 0},
{"rx-retry-num", required_argument, NULL, 0},
{"mergeable", required_argument, NULL, 0},
+   {"vlan-strip", required_argument, NULL, 0},
{"stats", required_argument, NULL, 0},
{"dev-basename", required_argument, NULL, 0},
{"zero-copy", required_argument, NULL, 0},
@@ -691,6 +696,22 @@ us_vhost_parse_args(int argc, char **argv)
}
}

+   /* Enable/disable RX VLAN strip on host. */
+   if (!strncmp(long_option[option_index].name,
+   "vlan-strip", MAX_LONG_OPT_SZ)) {
+   ret = parse_num_opt(optarg, 1);
+   if (ret == -1) {
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Invalid argument for VLAN 
strip [0|1]\n");
+   us_vhost_usage(prgname);
+   return -1;
+   } else {
+   vlan_strip = !!ret;
+   vmdq_conf_default.rxmode.hw_vlan_strip =
+   vlan_strip;
+   }
+   }
+
/* Enable/disable stats. */
if (!strncmp(long_option[option_index].name, "stats", 
MAX_LONG_OPT_SZ)) {
ret = parse_num_opt(optarg, INT32_MAX);
@@ -950,7 +971,9 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
dev->device_fh);

/* Enable stripping of the vlan tag as we handle routing. */
-   rte_eth_dev_set_vlan_strip_on_queue(ports[0], 
(uint16_t)vdev->vmdq_rx_q, 1);
+   if (vlan_strip)
+   rte_eth_dev_set_vlan_strip_on_queue(ports[0],
+   (uint16_t)vdev->vmdq_rx_q, 1);

/* Set device as ready for RX. */
vdev->ready = DEVICE_RX;
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 20/24] example/vhost: Avoid inserting vlan twice

2015-01-27 Thread Ouyang Changchun

Check if it has already been vlan-tagged packet, if true, avoid inserting a
duplicated vlan tag into it.

This is a possible case when guest has the capability of inserting vlan tag.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 04f0118..1d31520 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1115,6 +1115,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
struct virtio_net *dev = vdev->dev;
+   struct ether_hdr *nh;

/*check if destination is local VM*/
if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) {
@@ -1135,7 +1136,15 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
tx_q = _tx_queue[lcore_id];
len = tx_q->len;

-   m->ol_flags = PKT_TX_VLAN_PKT;
+   nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+   if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
+   /* Guest has inserted the vlan tag. */
+   struct vlan_hdr *vh = (struct vlan_hdr *) (nh + 1);
+   uint16_t vlan_tag_be = rte_cpu_to_be_16(vlan_tag);
+   if (vh->vlan_tci != vlan_tag_be)
+   vh->vlan_tci = vlan_tag_be;
+   } else {
+   m->ol_flags = PKT_TX_VLAN_PKT;

/*
 * Find the right seg to adjust the data len when offset is
@@ -1156,7 +1165,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
m->pkt_len += offset;
}

-   m->vlan_tci = vlan_tag;
+   m->vlan_tci = vlan_tag;
+   }

tx_q->m_table[len] = m;
len++;
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 19/24] ether: Fix vlan strip/insert issue

2015-01-27 Thread Ouyang Changchun

Need swap the data from cpu to BE(big endian) for vlan-type.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ether.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ether.h b/lib/librte_ether/rte_ether.h
index 74f71c2..0797908 100644
--- a/lib/librte_ether/rte_ether.h
+++ b/lib/librte_ether/rte_ether.h
@@ -351,7 +351,7 @@ static inline int rte_vlan_strip(struct rte_mbuf *m)
struct ether_hdr *eh
 = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (eh->ether_type != ETHER_TYPE_VLAN)
+   if (eh->ether_type != rte_cpu_to_be_16(ETHER_TYPE_VLAN))
return -1;

struct vlan_hdr *vh = (struct vlan_hdr *)(eh + 1);
@@ -401,7 +401,7 @@ static inline int rte_vlan_insert(struct rte_mbuf **m)
return -ENOSPC;

memmove(nh, oh, 2 * ETHER_ADDR_LEN);
-   nh->ether_type = ETHER_TYPE_VLAN;
+   nh->ether_type = rte_cpu_to_be_16(ETHER_TYPE_VLAN);

vh = (struct vlan_hdr *) (nh + 1);
vh->vlan_tci = rte_cpu_to_be_16((*m)->vlan_tci);
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 18/24] virtio: Fix descriptor index issue

2015-01-27 Thread Ouyang Changchun

It should use vring descriptor index instead of used_ring index to index 
vq_descx.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 580701a..a82e8eb 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -144,9 +144,9 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)

used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
uep = >vq_ring.used->ring[used_idx];
-   dxp = >vq_descx[used_idx];

desc_idx = (uint16_t) uep->id;
+   dxp = >vq_descx[desc_idx];
vq->vq_used_cons_idx++;
vq_ring_free_chain(vq, desc_idx);

-- 
1.8.4.2

[dpdk-dev] [PATCH v2 17/24] virtio: Use port IO to get PCI resource.

2015-01-27 Thread Ouyang Changchun

Make virtio not require UIO for some security reasons, this is to match 6Wind's 
virtio-net-pmd.

Signed-off-by: Changchun Ouyang 
---
 config/common_linuxapp  |  2 +
 lib/librte_eal/common/include/rte_pci.h |  4 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  5 +-
 lib/librte_pmd_virtio/virtio_ethdev.c   | 91 -
 4 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2f9643b..a412457 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -100,6 +100,8 @@ CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
+# Only for VIRTIO PMD currently
+CONFIG_RTE_EAL_PORT_IO=n

 #
 # Special configurations in PCI Config Space for high performance
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..19abc1f 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -193,6 +193,10 @@ struct rte_pci_driver {

 /** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
 #define RTE_PCI_DRV_NEED_MAPPING 0x0001
+/** Device needs port IO(done with /proc/ioports) */
+#ifdef RTE_EAL_PORT_IO
+#define RTE_PCI_DRV_PORT_IO 0x0002
+#endif
 /** Device driver must be registered several times until failure - deprecated 
*/
 #pragma GCC poison RTE_PCI_DRV_MULTIPLE
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f5410..5db0059 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -574,7 +574,10 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
/* map resources for devices that use igb_uio */
ret = pci_map_device(dev);
if (ret != 0)
-   return ret;
+#ifdef RTE_EAL_PORT_IO
+   if ((dr->drv_flags & RTE_PCI_DRV_PORT_IO) == 0)
+#endif
+   return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* unbind current driver */
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 8cd2d51..b905532 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -961,6 +961,71 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev)
 start, size);
return 0;
 }
+
+#ifdef RTE_EAL_PORT_IO
+/* Extract port I/O numbers from proc/ioports */
+static int virtio_resource_init_by_portio(struct rte_pci_device *pci_dev)
+{
+   uint16_t start, end;
+   int size;
+   FILE *fp;
+   char *line = NULL;
+   char pci_id[16];
+   int found = 0;
+   size_t linesz;
+
+   snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
+pci_dev->addr.domain,
+pci_dev->addr.bus,
+pci_dev->addr.devid,
+pci_dev->addr.function);
+
+   fp = fopen("/proc/ioports", "r");
+   if (fp == NULL) {
+   PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+   return -1;
+   }
+
+   while (getdelim(, , '\n', fp) > 0) {
+   char *ptr = line;
+   char *left;
+   int n;
+
+   n = strcspn(ptr, ":");
+   ptr[n] = 0;
+   left = [n+1];
+
+   while (*left && isspace(*left))
+   left++;
+
+   if (!strncmp(left, pci_id, strlen(pci_id))) {
+   found = 1;
+
+   while (*ptr && isspace(*ptr))
+   ptr++;
+
+   sscanf(ptr, "%04hx-%04hx", , );
+   size = end - start + 1;
+
+   break;
+   }
+   }
+
+   free(line);
+   fclose(fp);
+
+   if (!found)
+   return -1;
+
+   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
+   pci_dev->mem_resource[0].len =  (uint64_t)size;
+   PMD_INIT_LOG(DEBUG,
+"PCI Port IO found start=0x%lx with size=0x%lx",
+start, size);
+   return 0;
+}
+#endif
+
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -974,6 +1039,14 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
/* no setup required */
return 0;
 }
+
+#ifdef RTE_EAL_PORT_IO
+static int virtio_resource_init_by_portio(struct rte_pci_device *pci_dev)
+{
+   /* no setup required */
+   return 0;
+}
+#endif
 #endif

 /*
@@ -1039,7 +1112,10 @@ eth_virtio_dev_init(__rte_unused

[dpdk-dev] [PATCH v2 16/24] virtio: Free mbuf's with threshold

2015-01-27 Thread Ouyang Changchun

This makes virtio driver work like ixgbe. Transmit buffers are
held until a transmit threshold is reached. The previous behavior
was to hold mbuf's until the ring entry was reused which caused
more memory usage than needed.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  7 ++--
 lib/librte_pmd_virtio/virtio_rxtx.c   | 75 +--
 lib/librte_pmd_virtio/virtqueue.h |  3 +-
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b30ab2a..8cd2d51 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -176,15 +176,16 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,

virtqueue_notify(vq);

-   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   rte_rmb();
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   rte_rmb();
usleep(100);
+   }

while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   virtio_rmb();
-
used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
uep = >vq_ring.used->ring[used_idx];
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index b6d6832..580701a 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -129,17 +129,32 @@ virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct 
rte_mbuf **rx_pkts,
return i;
 }

+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
 static void
-virtqueue_dequeue_pkt_tx(struct virtqueue *vq)
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 {
-   struct vring_used_elem *uep;
-   uint16_t used_idx, desc_idx;
+   uint16_t i, used_idx, desc_idx;
+   for (i = 0; i < num; i++) {
+   struct vring_used_elem *uep;
+   struct vq_desc_extra *dxp;
+
+   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = >vq_ring.used->ring[used_idx];
+   dxp = >vq_descx[used_idx];
+
+   desc_idx = (uint16_t) uep->id;
+   vq->vq_used_cons_idx++;
+   vq_ring_free_chain(vq, desc_idx);

-   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-   uep = >vq_ring.used->ring[used_idx];
-   desc_idx = (uint16_t) uep->id;
-   vq->vq_used_cons_idx++;
-   vq_ring_free_chain(vq, desc_idx);
+   if (dxp->cookie != NULL) {
+   rte_pktmbuf_free(dxp->cookie);
+   dxp->cookie = NULL;
+   }
+   }
 }


@@ -203,8 +218,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie)

idx = head_idx;
dxp = >vq_descx[idx];
-   if (dxp->cookie != NULL)
-   rte_pktmbuf_free(dxp->cookie);
dxp->cookie = (void *)cookie;
dxp->ndescs = needed;

@@ -404,6 +417,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
struct virtqueue *vq;
+   uint16_t tx_free_thresh;
int ret;

PMD_INIT_FUNC_TRACE();
@@ -421,6 +435,22 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return ret;
}

+   tx_free_thresh = tx_conf->tx_free_thresh;
+   if (tx_free_thresh == 0)
+   tx_free_thresh =
+   RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
+
+   if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+   RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+   "number of TX entries minus 3 (%u)."
+   " (tx_free_thresh=%u port=%u queue=%u)\n",
+   vq->vq_nentries - 3,
+   tx_free_thresh, dev->data->port_id, queue_idx);
+   return -EINVAL;
+   }
+
+   vq->vq_free_thresh = tx_free_thresh;
+
dev->data->tx_queues[queue_idx] = vq;
return 0;
 }
@@ -688,11 +718,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
struct rte_mbuf *txm;
-   uint16_t nb_used, nb_tx, num;
+   uint16_t nb_used, nb_tx;
int error;

-   nb_tx = 0;
-
if (unlikely(nb_pkts < 1))
return nb_pkts;

@@ -700,21 +728,26 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
nb_used = VIRTQUEUE_NUSED(txvq);

virtio_rmb();
+   if (likely(nb_used > txvq->vq_free_thresh))
+   virtio_xmit_cleanup(txvq, nb_used);

-

[dpdk-dev] [PATCH v2 15/24] virtio: Add ability to set MAC address

2015-01-27 Thread Ouyang Changchun

Need to have do special things to set default mac address.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h |  5 +
 lib/librte_pmd_virtio/virtio_ethdev.c | 24 
 2 files changed, 29 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 94d6b2b..5a54276 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1240,6 +1240,10 @@ typedef void (*eth_mac_addr_add_t)(struct rte_eth_dev 
*dev,
  uint32_t vmdq);
 /**< @internal Set a MAC address into Receive Address Address Register */

+typedef void (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
+ struct ether_addr *mac_addr);
+/**< @internal Set a MAC address into Receive Address Address Register */
+
 typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
  struct ether_addr *mac_addr,
  uint8_t on);
@@ -1459,6 +1463,7 @@ struct eth_dev_ops {
priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority 
flow control.*/
eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
eth_mac_addr_add_t mac_addr_add;  /**< Add a MAC address */
+   eth_mac_addr_set_t mac_addr_set;  /**< Set a MAC address */
eth_uc_hash_table_set_tuc_hash_table_set;  /**< Set Unicast Table 
Array */
eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast 
hash bitmap */
eth_mirror_rule_set_t  mirror_rule_set;  /**< Add a traffic mirror 
rule.*/
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 0e74eea..b30ab2a 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -90,6 +90,8 @@ static void virtio_mac_addr_add(struct rte_eth_dev *dev,
struct ether_addr *mac_addr,
uint32_t index, uint32_t vmdq __rte_unused);
 static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -518,6 +520,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.vlan_filter_set = virtio_vlan_filter_set,
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
+   .mac_addr_set= virtio_mac_addr_set,
 };

 static inline int
@@ -733,6 +736,27 @@ virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
virtio_mac_table_set(hw, uc, mc);
 }

+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+
+   memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+   /* Use atomic update if available */
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   struct virtio_pmd_ctrl ctrl;
+   int len = ETHER_ADDR_LEN;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+   memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+   virtio_send_command(hw->cvq, , , 1);
+   } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+   virtio_set_hwaddr(hw);
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 14/24] virtio: Add suport for multiple mac addresses

2015-01-27 Thread Ouyang Changchun

Virtio support multiple MAC addresses.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 94 ++-
 lib/librte_pmd_virtio/virtio_ethdev.h |  3 +-
 lib/librte_pmd_virtio/virtqueue.h | 34 -
 3 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 591d692..0e74eea 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -86,6 +86,10 @@ static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
 static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -503,8 +507,6 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.stats_get   = virtio_dev_stats_get,
.stats_reset = virtio_dev_stats_reset,
.link_update = virtio_dev_link_update,
-   .mac_addr_add= NULL,
-   .mac_addr_remove = NULL,
.rx_queue_setup  = virtio_dev_rx_queue_setup,
/* meaningfull only to multiple queue */
.rx_queue_release= virtio_dev_rx_queue_release,
@@ -514,6 +516,8 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
.vlan_filter_set = virtio_vlan_filter_set,
+   .mac_addr_add= virtio_mac_addr_add,
+   .mac_addr_remove = virtio_mac_addr_remove,
 };

 static inline int
@@ -644,6 +648,92 @@ virtio_get_hwaddr(struct virtio_hw *hw)
 }

 static int
+virtio_mac_table_set(struct virtio_hw *hw,
+const struct virtio_net_ctrl_mac *uc,
+const struct virtio_net_ctrl_mac *mc)
+{
+   struct virtio_pmd_ctrl ctrl;
+   int err, len[2];
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+   len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+   memcpy(ctrl.data, uc, len[0]);
+
+   len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+   memcpy(ctrl.data + len[0], mc, len[1]);
+
+   err = virtio_send_command(hw->cvq, , len, 2);
+   if (err != 0)
+   PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+   return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   const struct ether_addr *addrs = dev->data->mac_addrs;
+   unsigned int i;
+   struct virtio_net_ctrl_mac *uc, *mc;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   const struct ether_addr *addr
+   = (i == index) ? mac_addr : addrs + i;
+   struct virtio_net_ctrl_mac *tbl
+   = is_multicast_ether_addr(addr) ? mc : uc;
+
+   memcpy(>macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+   }
+
+   virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct ether_addr *addrs = dev->data->mac_addrs;
+   struct virtio_net_ctrl_mac *uc, *mc;
+   unsigned int i;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   struct virtio_net_ctrl_mac *tbl;
+
+   if (i == index || is_zero_ether_addr(addrs + i))
+   continue;
+
+   tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+   memcpy(>macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
+

[dpdk-dev] [PATCH v2 13/24] virtio: Add support for vlan filtering

2015-01-27 Thread Ouyang Changchun

Virtio supports vlan filtering.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 39b1fb4..591d692 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -84,6 +84,8 @@ static void virtio_dev_tx_queue_release(__rte_unused void 
*txq);
 static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats);
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+   uint16_t vlan_id, int on);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -511,6 +513,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.tx_queue_release= virtio_dev_tx_queue_release,
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+   .vlan_filter_set = virtio_vlan_filter_set,
 };

 static inline int
@@ -640,14 +643,31 @@ virtio_get_hwaddr(struct virtio_hw *hw)
}
 }

+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct virtio_pmd_ctrl ctrl;
+   int len;
+
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+   return -ENOTSUP;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+   ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+   memcpy(ctrl.data, _id, sizeof(vlan_id));
+   len = sizeof(vlan_id);
+
+   return virtio_send_command(hw->cvq, , , 1);
+}

 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_VLAN;
-   mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+   /* checksum offload not implemented */
+   mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
 * checksum offload feature is also negotiated.
@@ -1058,6 +1078,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

+   if (rxmode->hw_vlan_filter
+   && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+   PMD_DRV_LOG(NOTICE,
+   "vlan filtering not available on this host");
+   return -ENOTSUP;
+   }
+
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 12/24] virtio: Move allocation before initialization

2015-01-27 Thread Ouyang Changchun

If allocation fails, don't want to leave virtio device stuck
in middle of initialization sequence.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 9679c2f..39b1fb4 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -890,6 +890,15 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return 0;

+   /* Allocate memory for storing MAC addresses */
+   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
+   if (eth_dev->data->mac_addrs == NULL) {
+   PMD_INIT_LOG(ERR,
+   "Failed to allocate %d bytes needed to store MAC 
addresses",
+   ETHER_ADDR_LEN);
+   return -ENOMEM;
+   }
+
/* Tell the host we've noticed this device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);

@@ -916,15 +925,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
}

-   /* Allocate memory for storing MAC addresses */
-   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
-   if (eth_dev->data->mac_addrs == NULL) {
-   PMD_INIT_LOG(ERR,
-   "Failed to allocate %d bytes needed to store MAC 
addresses",
-   ETHER_ADDR_LEN);
-   return -ENOMEM;
-   }
-
/* Copy the permanent MAC address to: virtio_hw */
virtio_get_hwaddr(hw);
ether_addr_copy((struct ether_addr *) hw->mac_addr,
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 11/24] virtio: Check for packet headroom at compile time

2015-01-27 Thread Ouyang Changchun

Better to check at compile time than fail at runtime.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 47dd33d..9679c2f 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -882,11 +882,7 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;

-   if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
-   PMD_INIT_LOG(ERR,
-   "MBUF HEADROOM should be enough to hold virtio net 
hdr\n");
-   return -1;
-   }
+   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));

eth_dev->dev_ops = _eth_dev_ops;
eth_dev->tx_pkt_burst = _xmit_pkts;
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 10/24] virtio: Make vtpci_get_status local

2015-01-27 Thread Ouyang Changchun

Make vtpci_get_status a local function as it is used in one file.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_pci.c | 4 +++-
 lib/librte_pmd_virtio/virtio_pci.h | 2 --
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_pci.c 
b/lib/librte_pmd_virtio/virtio_pci.c
index b099e4f..2245bec 100644
--- a/lib/librte_pmd_virtio/virtio_pci.c
+++ b/lib/librte_pmd_virtio/virtio_pci.c
@@ -35,6 +35,8 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"

+static uint8_t vtpci_get_status(struct virtio_hw *);
+
 void
 vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
void *dst, int length)
@@ -113,7 +115,7 @@ vtpci_reinit_complete(struct virtio_hw *hw)
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
 }

-uint8_t
+static uint8_t
 vtpci_get_status(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 0a4b578..64d9c34 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -255,8 +255,6 @@ void vtpci_reset(struct virtio_hw *);

 void vtpci_reinit_complete(struct virtio_hw *);

-uint8_t vtpci_get_status(struct virtio_hw *);
-
 void vtpci_set_status(struct virtio_hw *, uint8_t);

 uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 08/24] virtio: Remove redundant vq_alignment

2015-01-27 Thread Ouyang Changchun

Since vq_alignment is constant (always 4K), it does not
need to be part of the vring struct.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 1 -
 lib/librte_pmd_virtio/virtio_rxtx.c   | 2 +-
 lib/librte_pmd_virtio/virtqueue.h | 3 +--
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 59b74b7..0d41e7f 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -294,7 +294,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->port_id = dev->data->port_id;
vq->queue_id = queue_idx;
vq->vq_queue_index = vtpci_queue_idx;
-   vq->vq_alignment = VIRTIO_PCI_VRING_ALIGN;
vq->vq_nentries = vq_size;
vq->vq_free_cnt = vq_size;

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index a82d5ff..b6d6832 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -258,7 +258,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 * Reinitialise since virtio port might have been stopped and restarted
 */
memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, vq->vq_alignment);
+   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
vq->vq_used_cons_idx = 0;
vq->vq_desc_head_idx = 0;
vq->vq_avail_idx = 0;
diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index f6ad98d..5b8a255 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -138,8 +138,7 @@ struct virtqueue {
uint8_t port_id;  /**< Device port identifier. */

void*vq_ring_virt_mem;/**< linear address of vring*/
-   int vq_alignment;
-   int vq_ring_size;
+   unsigned int vq_ring_size;
phys_addr_t vq_ring_mem;  /**< physical address of vring */

struct vring vq_ring;/**< vring keeping desc, used and avail */
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 07/24] virtio: Remove unnecessary adapter structure

2015-01-27 Thread Ouyang Changchun

Cleanup virtio code by eliminating unnecessary nesting of
virtio hardware structure inside adapter structure.
Also allows removing unneeded macro, making code clearer.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 43 ---
 lib/librte_pmd_virtio/virtio_ethdev.h |  9 
 lib/librte_pmd_virtio/virtio_rxtx.c   |  3 +--
 3 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index da74659..59b74b7 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -207,8 +207,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 static int
 virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -242,8 +241,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
const struct rte_memzone *mz;
uint16_t vq_size;
int size;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue  *vq = NULL;

/* Write the virtqueue index to the Queue Select Field */
@@ -383,8 +381,7 @@ virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx,
struct virtqueue *vq;
uint16_t nb_desc = 0;
int ret;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_FUNC_TRACE();
ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
@@ -410,8 +407,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -430,8 +426,7 @@ virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -450,8 +445,7 @@ virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -470,8 +464,7 @@ virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -853,8 +846,7 @@ virtio_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,
 void *param)
 {
struct rte_eth_dev *dev = param;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
uint8_t isr;

/* Read interrupt status which clears interrupt */
@@ -880,12 +872,11 @@ static int
 eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
struct rte_eth_dev *eth_dev)
 {
+   struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);

if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
PMD_INIT_LOG(ERR,
@@ -1010,7 +1001,7 @@ static struct eth_driver rte_virtio_pmd = {
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
-   .dev_private_size = sizeof(struct virtio_adapter),
+   .dev_private_size = sizeof(struct virtio_hw),
 };

 /*
@@ -1053,8 +1044,7 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+

[dpdk-dev] [PATCH v2 06/24] virtio: Use software vlan stripping

2015-01-27 Thread Ouyang Changchun

Implement VLAN stripping in software. This allows application
to be device independent.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h |  3 +++
 lib/librte_pmd_virtio/virtio_ethdev.c |  2 ++
 lib/librte_pmd_virtio/virtio_pci.h|  1 +
 lib/librte_pmd_virtio/virtio_rxtx.c   | 20 ++--
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1200c1c..94d6b2b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -643,6 +643,9 @@ struct rte_eth_rxconf {
 #define ETH_TXQ_FLAGS_NOOFFLOADS \
(ETH_TXQ_FLAGS_NOVLANOFFL | ETH_TXQ_FLAGS_NOXSUMSCTP | \
 ETH_TXQ_FLAGS_NOXSUMUDP  | ETH_TXQ_FLAGS_NOXSUMTCP)
+#define ETH_TXQ_FLAGS_NOXSUMS \
+   (ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
+ETH_TXQ_FLAGS_NOXSUMTCP)
 /**
  * A structure used to configure a TX ring of an Ethernet port.
  */
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index ef87ff8..da74659 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1064,6 +1064,8 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

+   hw->vlan_strip = rxmode->hw_vlan_strip;
+
ret = vtpci_irq_config(hw, 0);
if (ret != 0)
PMD_DRV_LOG(ERR, "failed to set config vector");
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 6998737..6d93fac 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -168,6 +168,7 @@ struct virtio_hw {
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
+   uint8_t vlan_strip;
uint8_t use_msix;
uint8_t mac_addr[ETHER_ADDR_LEN];
 };
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 78af334..e0216ec 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -408,8 +409,8 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,

PMD_INIT_FUNC_TRACE();

-   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOOFFLOADS)
-   != ETH_TXQ_FLAGS_NOOFFLOADS) {
+   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
+   != ETH_TXQ_FLAGS_NOXSUMS) {
PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
return -EINVAL;
}
@@ -446,6 +447,7 @@ uint16_t
 virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
+   struct virtio_hw *hw = rxvq->hw;
struct rte_mbuf *rxm, *new_mbuf;
uint16_t nb_used, num, nb_rx = 0;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
@@ -489,6 +491,9 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
rxm->data_len = (uint16_t)(len[i] - hdr_size);

+   if (hw->vlan_strip)
+   rte_vlan_strip(rxm);
+
VIRTIO_DUMP_PACKET(rxm, rxm->data_len);

rx_pkts[nb_rx++] = rxm;
@@ -717,6 +722,17 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 */
if (likely(need <= 0)) {
txm = tx_pkts[nb_tx];
+
+   /* Do VLAN tag insertion */
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+   error = rte_vlan_insert();
+   if (unlikely(error)) {
+   rte_pktmbuf_free(txm);
+   ++nb_tx;
+   continue;
+   }
+   }
+
/* Enqueue Packet buffers */
error = virtqueue_enqueue_xmit(txvq, txm);
if (unlikely(error)) {
-- 
1.8.4.2

[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt

2015-01-27 Thread Ouyang Changchun

Virtio has link state interrupt which can be used.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 78 +++
 lib/librte_pmd_virtio/virtio_pci.c| 22 ++
 lib/librte_pmd_virtio/virtio_pci.h|  4 ++
 3 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 5df3b54..ef87ff8 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -845,6 +845,34 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
 #endif

 /*
+ * Process Virtio Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+void *param)
+{
+   struct rte_eth_dev *dev = param;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint8_t isr;
+
+   /* Read interrupt status which clears interrupt */
+   isr = vtpci_isr(hw);
+   PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
+
+   if (rte_intr_enable(>pci_dev->intr_handle) < 0)
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+   if (isr & VIRTIO_PCI_ISR_CONFIG) {
+   if (virtio_dev_link_update(dev, 0) == 0)
+   _rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_INTR_LSC);
+   }
+
+}
+
+/*
  * This function is based on probe() function in virtio_pci.c
  * It returns 0 on success.
  */
@@ -968,6 +996,10 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);
+
+   /* Setup interrupt callback  */
+   rte_intr_callback_register(_dev->intr_handle,
+  virtio_interrupt_handler, eth_dev);
return 0;
 }

@@ -975,7 +1007,7 @@ static struct eth_driver rte_virtio_pmd = {
{
.name = "rte_virtio_pmd",
.id_table = pci_id_virtio_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
.dev_private_size = sizeof(struct virtio_adapter),
@@ -1021,6 +1053,9 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1029,7 +1064,11 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

-   return 0;
+   ret = vtpci_irq_config(hw, 0);
+   if (ret != 0)
+   PMD_DRV_LOG(ERR, "failed to set config vector");
+
+   return ret;
 }


@@ -1037,7 +1076,6 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   uint16_t status;
struct virtio_hw *hw =
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

@@ -1052,18 +1090,22 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Do final configuration before rx/tx engine starts */
virtio_dev_rxtx_start(dev);

-   /* Check VIRTIO_NET_F_STATUS for link status*/
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-   vtpci_read_dev_config(hw,
-   offsetof(struct virtio_net_config, status),
-   , sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
-   PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
-dev->data->port_id);
-   else
-   PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
-dev->data->port_id);
+   /* check if lsc interrupt feature is enabled */
+   if (dev->data->dev_conf.intr_conf.lsc) {
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+   PMD_DRV_LOG(ERR, "link status not supported by host");
+   return -ENOTSUP;
+   }
+
+   if (rte_intr_enable(>pci_dev->intr_handle) < 0) {
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+   return -EIO;
+   }
}
+
+   /* Initialize Link state */
+   virtio_dev_link_update(dev, 0);
+
vtpci_reinit_complete(hw);

/*Notify the backend
@@ -1145,6 +1187,7 @@ virtio_dev_stop(struct rte_eth_dev *dev)
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

/* reset the NIC */
+

[dpdk-dev] [PATCH v2 03/24] virtio: Allow starting with link down

2015-01-27 Thread Ouyang Changchun

Starting driver with link down should be ok, it is with every
other driver. So just allow it.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index dc47e72..5df3b54 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1057,14 +1057,12 @@ virtio_dev_start(struct rte_eth_dev *dev)
vtpci_read_dev_config(hw,
offsetof(struct virtio_net_config, status),
, sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
+   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
 dev->data->port_id);
-   return -EIO;
-   } else {
+   else
PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
 dev->data->port_id);
-   }
}
vtpci_reinit_complete(hw);

-- 
1.8.4.2

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Ouyang Changchun

This is the patch set for single virtio implementation.

Why we need single virtio?

As we know currently there are at least 3 virtio PMD driver implementations:
A) lib/librte_pmd_virtio(refer as virtio A);
B) virtio_net_pmd by 6wind(refer as virtio B);
C) virtio by Brocade/vyatta(refer as virtio C);

Integrating 3 implementations into one could reduce the maintaining cost and 
time,
in other hand, user don't need practice their application on 3 variant one by 
one to see
which one is the best for them;

What's the status?

Currently virtio A has covered most features of virtio B except for using port 
io to get pci resource,
so there is a patch(17/22) to resolve it. But on the other hand there are a few 
differences between
virtio A and virtio C, it needs integrate features/codes of virtio C into 
virtio A.
This patch set bases on two original RFC patch sets from Stephen 
Hemminger[stephen at networkplumber.org]
Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for the 
original one.
This patch set also resolves some conflict with latest codes, removed 
duplicated codes, fix some
issues in original codes.

What this patch set contains:
===
  1) virtio: Rearrange resource initialization, it extracts a function to setup 
PCI resources;
  2) virtio: Use weaker barriers, as DPDK driver only has to deal with the case 
of running on PCI
 and with SMP, In this case, the code can use the weaker barriers instead 
of using hard (fence)
 barriers. This may help performance a bit;
  3) virtio: Allow starting with link down, other driver has similar behavior;
  4) virtio: Add support for Link State interrupt;
  5) ether: Add soft vlan encap/decap functions, it helps if HW don't support 
vlan strip;
  6) virtio: Use software vlan stripping;
  7) virtio: Remove unnecessary adapter structure;
  8) virtio: Remove redundant vq_alignment, as vq alignment is always 4K, so 
use constant when needed;
  9) virtio: Fix how states are handled during initialization, this is to match 
Linux kernel;
  10) virtio: Make vtpci_get_status a local function as it is used in one file;
  11) virtio: Check for packet headroom at compile time;
  12) virtio: Move allocation before initialization to avoid being stuck in 
middle of virtio init;
  13) virtio: Add support for vlan filtering;
  14) virtio: Add support for multiple mac addresses;
  15) virtio: Add ability to set MAC address;
  16) virtio: Free mbuf's with threshold, this makes its behavior more like 
ixgbe;
  17) virtio: Use port IO to get PCI resource for security reasons and match 
virtio-net-pmd;
  18) virtio: Fix descriptor index issue;
  19) ether: Fix vlan strip/insert issue;
  20) example/vhost: Avoid inserting vlan twice and guest and host;
  21) example/vhost: Add vlan-strip cmd line option to turn on/off vlan strip 
on host;
  22) virtio: Use soft vlan strip in mergeable Rx path, this makes it has 
consistent logic
  with the normal Rx path.

Changes in v2:
  23) virtio: Fix zero copy break issue, the vring should be ready before 
virtio PMD set
  the status of DRIVER_OK;
  24) virtio: Remove unnecessary hotspots in data path.

Changchun Ouyang (8):
  virtio: Use port IO to get PCI resource.
  virtio: Fix descriptor index issue
  ether: Fix vlan strip/insert issue
  example/vhost: Avoid inserting vlan twice
  example/vhost: Add vlan-strip cmd line option
  virtio: Use soft vlan strip in mergeable Rx path
  virtio: Fix zero copy break issue
  virtio: Remove hotspots

Stephen Hemminger (16):
  virtio: Rearrange resource initialization
  virtio: Use weaker barriers
  virtio: Allow starting with link down
  virtio: Add support for Link State interrupt
  ether: Add soft vlan encap/decap functions
  virtio: Use software vlan stripping
  virtio: Remove unnecessary adapter structure
  virtio: Remove redundant vq_alignment
  virtio: Fix how states are handled during initialization
  virtio: Make vtpci_get_status local
  virtio: Check for packet headroom at compile time
  virtio: Move allocation before initialization
  virtio: Add support for vlan filtering
  virtio: Add suport for multiple mac addresses
  virtio: Add ability to set MAC address
  virtio: Free mbuf's with threshold

 config/common_linuxapp  |   2 +
 examples/vhost/main.c   |  39 ++-
 lib/librte_eal/common/include/rte_pci.h |   4 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   5 +-
 lib/librte_ether/rte_ethdev.h   |   8 +
 lib/librte_ether/rte_ether.h|  76 +
 lib/librte_pmd_virtio/virtio_ethdev.c   | 492 +---
 lib/librte_pmd_virtio/virtio_ethdev.h   |  12 +-
 lib/librte_pmd_virtio/virtio_pci.c  |  20 +-
 lib/librte_pmd_virtio/virtio_pci.h  |   8 +-
 lib/librte_pmd_virtio/virtio_rxtx.c | 139 ++---
 lib/librte_pmd_virtio/virtqueue.h   |  59 +++-
 12 files changed, 693 insertions(+), 171 deletions(-)

--

[dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to rte_memzone_reserve

2015-01-27 Thread Thomas Monjalon

Hi,

2015-01-22 15:05, Cian Ferriter:
> Removes the dependency that this memzone reserve has on the
> socket currently running on. Following the socket of the master
> core will yield more predictable results when calling this
> function after initialisation.

You don't describe what is the problem. In another mail, you say
"The original suggestion also fixes the crash that I was seeing because
of memory being reserved from a numa node with no "--socket-mem" allocated."
Please describe it clearly in the commit log.

You should also explain what this rte_memzone_reserve() is for,
and what are the incidences of your changes.

Thanks
-- 
Thomas

> @@ -184,7 +184,7 @@ rte_eth_dev_data_alloc(void)
>   if (rte_eal_process_type() == RTE_PROC_PRIMARY){
>   mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
>   RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
> - rte_socket_id(), flags);
> + rte_lcore_to_socket_id(rte_get_master_lcore()), 
> flags);

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Stephen Hemminger

On Mon, 26 Jan 2015 19:06:12 -0800
Matthew Hall  wrote:

> Thank you so much for this, using virtio drivers in DPDK has been messy and 
> unpleasant in the past, and you clearly wrote a lot of nice new code to help 
> improve it all.
> 
> Previously I'd reported a bug, where all RTE virtio drivers I tried (A and B, 
> because I did not know C existed), failed to work with the virtio-net 
> interfaces exposed in VirtualBox, due to various strange errors, and they all 
> only worked with the virtio-net interfaces from qemu.

I suspect a problem with features required (and not supported by VirtualBox).
Build driver with debug enabled and send the log please.

[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt

2015-01-27 Thread Stephen Hemminger

On Tue, 27 Jan 2015 09:04:07 +
"Xie, Huawei"  wrote:

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> > Sent: Tuesday, January 27, 2015 10:36 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State 
> > interrupt
> > 
> > Virtio has link state interrupt which can be used.
> > 
> > Signed-off-by: Stephen Hemminger 
> > Signed-off-by: Changchun Ouyang 
> > ---
> >  lib/librte_pmd_virtio/virtio_ethdev.c | 78 
> > +++--
> > --
> >  lib/librte_pmd_virtio/virtio_pci.c| 22 ++
> >  lib/librte_pmd_virtio/virtio_pci.h|  4 ++
> >  3 files changed, 86 insertions(+), 18 deletions(-)
> > 
> > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> > b/lib/librte_pmd_virtio/virtio_ethdev.c
> > index 5df3b54..ef87ff8 100644
> > --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> > @@ -845,6 +845,34 @@ static int virtio_resource_init(struct rte_pci_device
> > *pci_dev __rte_unused)
> >  #endif
> > 
> >  /*
> > + * Process Virtio Config changed interrupt and call the callback
> > + * if link state changed.
> > + */
> > +static void
> > +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
> > +void *param)
> > +{
> > +   struct rte_eth_dev *dev = param;
> > +   struct virtio_hw *hw =
> > +   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   uint8_t isr;
> > +
> > +   /* Read interrupt status which clears interrupt */
> > +   isr = vtpci_isr(hw);
> > +   PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
> > +
> > +   if (rte_intr_enable(>pci_dev->intr_handle) < 0)
> > +   PMD_DRV_LOG(ERR, "interrupt enable failed");
> > +  
> 
> Is it better to put rte_intr_enable after we have handled the interrupt.
> Is there the possibility of interrupt reentrant in uio intr framework?

The UIO framework handles IRQ's via posix thread that is reading
fd, then calling this code. Therefore it is always single threaded.

[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

2015-01-27 Thread Stephen Hemminger


> I recall our original code is virtio_wmb(). 
> Use store fence to ensure all updates to entries before updating the index.
> Why do we need virtio_rmb() here and add virtio_wmb after 
> vq_update_avail_idx()?

Store fence is unnecessary, Intel CPU's are cache coherent, please read
the virtio Linux ring header file for explanation. A full fence WMB
is more expensive and causes CPU stall

> > vq->vq_ring.avail->idx = vq->vq_avail_idx;
> >  }
> > 
> > @@ -255,7 +264,7 @@ static inline void
> >  virtqueue_notify(struct virtqueue *vq)
> >  {
> > /*
> > -* Ensure updated avail->idx is visible to host. mb() necessary?
> > +* Ensure updated avail->idx is visible to host.
> >  * For virtio on IA, the notificaiton is through io port operation
> >  * which is a serialization instruction itself.
> >  */
> > --
> > 1.8.4.2
>

[dpdk-dev] [PATCH v2] vhost: Add -lfuse into the LDFLAGS list

2015-01-27 Thread Neil Horman

the vhost library relies on libfuse, and thats included when we do a normal
shared object build, but when we specify combined libs, its gets left out.  Add
it back in

Signed-off-by: Neil Horman 

---
Change notes:
v2) Removed normal shared object inclusion of libfuse since its always included
now
---
 mk/rte.app.mk | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9c8b06a..4294d9a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -131,6 +131,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y)
 LDLIBS += -lpcap
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+LDLIBS += -lfuse
+endif
+
 LDLIBS += --start-group

 ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)
@@ -197,7 +201,6 @@ endif

 ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
 LDLIBS += -lrte_vhost
-LDLIBS += -lfuse
 endif

 ifeq ($(CONFIG_RTE_LIBRTE_ENIC_PMD),y)
-- 
2.1.0

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-27 Thread Olivier MATZ

Hi Konstantin,

On 01/26/2015 03:15 PM, Ananyev, Konstantin wrote:
 Another thing - IPIP seems to work ok by HW.
 There is something wrong on our (PMD/test-pmd) side.
 I think at least we have to remove the following check:
 if (!l2_len) {
  PMD_DRV_LOG(DEBUG, "L2 length set to 0");
  return;
  }
 in i40e_txd_enable_checksum().
>>>
>>> Yes, for IPIP, the check should be removed.
>>
>> Yes, I think these lines should be removed for 2 reasons:
>> - it may be the cause of ipip tunnel not working
>> - we shouldn't do these kind of tests in dataplane. I think we have to
>>suppose that the data passed to the PMD is valid.
>>
>> I'll redo the test with ipip tomorrow with this fix and let you
>> know the result. If it works, I'll add this in the next version
>> of the patch.
>
> While you are on this, can I suggest you'll add debug logging for TCD and TDD 
> we are writing to the TX ring?
> Something like that:
>
> +   PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]:\n"
> +   "tunneling_params: %#x;\n"
> +   "l2tag2: %#hx;\n"
> +   "rsvd: %#hx;\n"
> +   "type_cmd_tso_mss: %#lx;\n",
> +   tx_pkt, tx_id,
> +   ctx_txd->tunneling_params,
> +   ctx_txd->l2tag2,
> +   ctx_txd->rsvd,
> +   ctx_txd->type_cmd_tso_mss);
>
> And same for TDD.
> It  helped me a lot to figure out what is going on, when I did my tests.
> Probably would be useful for other people too.

Sure, I'll add this.

Also, just to let you know that I tested the ipip case without the
"if (l2_len) return" and "if (l3_len) return", and it is working.

Regards,
Olivier

[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt

2015-01-27 Thread Xie, Huawei



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Tuesday, January 27, 2015 10:36 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State 
> interrupt
> 
> Virtio has link state interrupt which can be used.
> 
> Signed-off-by: Stephen Hemminger 
> Signed-off-by: Changchun Ouyang 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c | 78 +++--
> --
>  lib/librte_pmd_virtio/virtio_pci.c| 22 ++
>  lib/librte_pmd_virtio/virtio_pci.h|  4 ++
>  3 files changed, 86 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> b/lib/librte_pmd_virtio/virtio_ethdev.c
> index 5df3b54..ef87ff8 100644
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> @@ -845,6 +845,34 @@ static int virtio_resource_init(struct rte_pci_device
> *pci_dev __rte_unused)
>  #endif
> 
>  /*
> + * Process Virtio Config changed interrupt and call the callback
> + * if link state changed.
> + */
> +static void
> +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
> +  void *param)
> +{
> + struct rte_eth_dev *dev = param;
> + struct virtio_hw *hw =
> + VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + uint8_t isr;
> +
> + /* Read interrupt status which clears interrupt */
> + isr = vtpci_isr(hw);
> + PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
> +
> + if (rte_intr_enable(>pci_dev->intr_handle) < 0)
> + PMD_DRV_LOG(ERR, "interrupt enable failed");
> +

Is it better to put rte_intr_enable after we have handled the interrupt.
Is there the possibility of interrupt reentrant in uio intr framework?

> + if (isr & VIRTIO_PCI_ISR_CONFIG) {
> + if (virtio_dev_link_update(dev, 0) == 0)
> + _rte_eth_dev_callback_process(dev,
> +
> RTE_ETH_EVENT_INTR_LSC);
> + }
> +
> +}
> +
>

[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.

2015-01-27 Thread Neil Horman

On Tue, Jan 20, 2015 at 06:40:49PM +, Konstantin Ananyev wrote:
> v3 changes:
> Applied review comments from Thomas:
> - fix spelling errors reported by codespell.
> - split last patch into two:
> first to remove unused macros,
> second to add some comments about ACL internal layout.
> 
> v2 changes:
> - When build with the compilers that don't support AVX2 instructions,
> make rte_acl_classify_avx2() do nothing and return an error.
> - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*.
> - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y
> always buildable.
> 
> This patch series contain several fixes and enhancements for ACL library.
> See complete list below.
> Two main changes that are externally visible:
> - Introduce new classify method:  RTE_ACL_CLASSIFY_AVX2.
> It uses AVX2 instructions and 256 bit wide data types
> to perform internal trie traversal.
> That helps to increase classify() throughput.
> This method is selected as default one on CPUs that supports AVX2.
> - Introduce new field in the build config structure: max_size.
> It specifies maximum size that internal RT structure for given context
> can reach.
> The purpose of that is to allow user to decide about space/performance 
> trade-off
> (faster classify() vs less space for RT internal structures)
> for each given set of rules.
> 
> Konstantin Ananyev (18):
>   fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y
>   app/test: few small fixes fot test_acl.c
>   librte_acl: make data_indexes long enough to survive idle transitions.
>   librte_acl: remove build phase heuristsic with negative performance
> effect.
>   librte_acl: fix a bug at build phase that can cause matches beeing
> overwirtten.
>   librte_acl: introduce DFA nodes compression (group64) for identical
> entries.
>   librte_acl: build/gen phase - simplify the way match nodes are
> allocated.
>   librte_acl: make scalar RT code to be more similar to vector one.
>   librte_acl: a bit of RT code deduplication.
>   EAL: introduce rte_ymm and relatives in rte_common_vect.h.
>   librte_acl: add AVX2 as new rte_acl_classify() method
>   test-acl: add ability to manually select RT method.
>   librte_acl: Remove search_sse_2 and relatives.
>   libter_acl: move lo/hi dwords shuffle out from calc_addr
>   libte_acl: make calc_addr a define to deduplicate the code.
>   libte_acl: introduce max_size into rte_acl_config.
>   libte_acl: remove unused macros.
>   libte_acl: add some comments about ACL internal layout.
> 
>  app/test-acl/main.c | 126 +++--
>  app/test/test_acl.c |   8 +-
>  examples/l3fwd-acl/main.c   |   3 +-
>  examples/l3fwd/main.c   |   2 +-
>  lib/librte_acl/Makefile |  18 +
>  lib/librte_acl/acl.h|  58 ++-
>  lib/librte_acl/acl_bld.c| 392 +++-
>  lib/librte_acl/acl_gen.c| 268 +++
>  lib/librte_acl/acl_run.h|   7 +-
>  lib/librte_acl/acl_run_avx2.c   |  54 +++
>  lib/librte_acl/acl_run_avx2.h   | 284 
>  lib/librte_acl/acl_run_scalar.c |  65 ++-
>  lib/librte_acl/acl_run_sse.c| 585 
> +---
>  lib/librte_acl/acl_run_sse.h| 357 +++
>  lib/librte_acl/acl_vect.h   | 132 +++---
>  lib/librte_acl/rte_acl.c|  47 +-
>  lib/librte_acl/rte_acl.h|   4 +
>  lib/librte_acl/rte_acl_osdep_alone.h|  47 +-
>  lib/librte_eal/common/include/rte_common_vect.h |  39 +-
>  lib/librte_lpm/rte_lpm.h|   2 +-
>  20 files changed, 1444 insertions(+), 1054 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_avx2.c
>  create mode 100644 lib/librte_acl/acl_run_avx2.h
>  create mode 100644 lib/librte_acl/acl_run_sse.h
> 
> -- 
> 1.8.5.3
> 
> 
For the series
Acked-by: Neil Horman

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Wang, Zhihong



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of EDMISON, Kelvin
> (Kelvin)
> Sent: Friday, January 23, 2015 2:22 AM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> 
> On 2015-01-21, 3:54 PM, "Neil Horman"  wrote:
> 
> >On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
> >> On Wed, 21 Jan 2015 13:26:20 +
> >> Bruce Richardson  wrote:
> >>
> >> > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> >> > >
> >> > > On 21/01/15 14:02, Bruce Richardson wrote:
> >> > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> >> > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> >> > > -Original Message-
> >> > > From: Richardson, Bruce
> >> > > Sent: Wednesday, January 21, 2015 12:15 AM
> >> > > To: Neil Horman
> >> > > Cc: Wang, Zhihong; dev at dpdk.org
> >> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >> > > 
> >> > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> >> > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong
> wrote:
> >> > > >>>-Original Message-
> >> > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> >> > > >>>Sent: Monday, January 19, 2015 9:02 PM
> >> > > >>>To: Wang, Zhihong
> >> > > >>>Cc: dev at dpdk.org
> >> > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> optimization
> >> > > >>>
> >> > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
> >>zhihong.wang at intel.com
> >> > > wrote:
> >> > > This patch set optimizes memcpy for DPDK for both SSE and
> >>AVX
> >> > > platforms.
> >> > > It also extends memcpy test coverage with unaligned cases
> >>and
> >> > > more test
> >> > > >>>points.
> >> > > Optimization techniques are summarized below:
> >> > > 
> >> > > 1. Utilize full cache bandwidth
> >> > > 
> >> > > 2. Enforce aligned stores
> >> > > 
> >> > > 3. Apply load address alignment based on architecture
> >>features
> >> > > 
> >> > > 4. Make load/store address available as early as possible
> >> > > 
> >> > > 5. General optimization techniques like inlining, branch
> >> > > reducing, prefetch pattern access
> >> > > 
> >> > > Zhihong Wang (4):
> >> > >    Disabled VTA for memcpy test in app/test/Makefile
> >> > >    Removed unnecessary test cases in test_memcpy.c
> >> > >    Extended test coverage in test_memcpy_perf.c
> >> > >    Optimized memcpy in arch/x86/rte_memcpy.h for both
> SSE
> >>and AVX
> >> > >  platforms
> >> > > 
> >> > >   app/test/Makefile  |   6 +
> >> > >   app/test/test_memcpy.c |  52
> >>+-
> >> > >   app/test/test_memcpy_perf.c| 238
> >>+---
> >> > >   .../common/include/arch/x86/rte_memcpy.h   | 664
> >> > > >>>+++--
> >> > >   4 files changed, 656 insertions(+), 304 deletions(-)
> >> > > 
> >> > > --
> >> > > 1.9.3
> >> > > 
> >> > > 
> >> > > >>>Are you able to compile this with gcc 4.9.2?  The
> >>compilation of
> >> > > >>>test_memcpy_perf is taking forever for me.  It appears hung.
> >> > > >>>Neil
> >> > > >>Neil,
> >> > > >>
> >> > > >>Thanks for reporting this!
> >> > > >>It should compile but will take quite some time if the CPU
> >>doesn't support
> >> > > AVX2, the reason is that:
> >> > > >>1. The SSE & AVX memcpy implementation is more
> complicated
> >>than
> >> > > AVX2
> >> > > >>version thus the compiler takes more time to compile and
> >>optimize 2.
> >> > > >>The new test_memcpy_perf.c contains 126 constants memcpy
> >>calls for
> >> > > >>better test case coverage, that's quite a lot
> >> > > >>
> >> > > >>I've just tested this patch on an Ivy Bridge machine with GCC
> >>4.9.2:
> >> > > >>1. The whole compile process takes 9'41" with the original
> >> > > >>test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2.
> >>It takes
> >> > > >>only 2'41" after I reduce the constant memcpy call number to
> >>12 + 12
> >> > > >>= 24
> >> > > >>
> >> > > >>I'll reduce memcpy call in the next version of patch.
> >> > > >>
> >> > > >ok, thank you.  I'm all for optimzation, but I think a compile
> >>that
> >> > > >takes almost
> >> > > >10 minutes for a single file is going to generate some raised
> >>eyebrows
> >> > > >when end users start tinkering with it
> >> > > >
> >> > > >Neil
> >> > > >
> >> > > >>Zhihong (John)
> >> > > >>
> >> > > Even two minutes is a very long time to compile, IMHO. The
> >>whole of DPDK
> >> > > doesn't take that long to compile right now, and

[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

2015-01-27 Thread Ouyang, Changchun

Hi Stephen,
Although it is original code logic, 
But we can move vq_update_avail_idx(rxvq) into if block to resolve it.
What do you think of it?

Thanks
Changchun

-Original Message-
From: Xie, Huawei 
Sent: Tuesday, January 27, 2015 3:57 PM
To: Ouyang, Changchun; dev at dpdk.org
Cc: Stephen Hemminger (stephen at networkplumber.org)
Subject: RE: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

>---if (likely(nb_enqueued)) {
>--->---virtio_wmb();
>--->---if (unlikely(virtqueue_kick_prepare(rxvq))) {
>--->--->---virtqueue_notify(rxvq);
>--->--->---PMD_RX_LOG(DEBUG, "Notified\n");
>--->---}
>---}
>---vq_update_avail_idx(rxvq);


Two confuses for the modification here:

1.
why notify host without updating avail idx?
Will this cause potential deadlock?

2.
Why update avail index even no packets are enqueued?

[dpdk-dev] [PATCH v2] vhost: Add -lfuse into the LDFLAGS list

2015-01-27 Thread Thomas Monjalon

2015-01-27 09:39, Neil Horman:
> the vhost library relies on libfuse, and thats included when we do a normal
> shared object build, but when we specify combined libs, its gets left out.  
> Add
> it back in
> 
> Signed-off-by: Neil Horman 
> 
> ---
> Change notes:
> v2) Removed normal shared object inclusion of libfuse since its always 
> included
> now

Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Wang, Zhihong

Hey Luke,

Thanks for the excellent questions!

The following script will launch the memcpy test in DPDK:
echo -e 'memcpy_autotest\nmemcpy_perf_autotest\nquit\n' | 
./x86_64-native-linuxapp-gcc/app/test -c 4 -n 4 -- -i

Thanks for sharing the object code, I think it?s the Sandy Bridge version 
though.
The rte_memcpy for Haswell is quite simple too, this is a decision based on 
arch difference: Haswell has significant improvements in memory hierarchy.
The Sandy Bridge unaligned memcpy is large in size but it has better 
performance because converting unaligned loads into aligned ones is crucial for 
in cache memcpy on Sandy Bridge.

The rep instruction is still not fast enough yet, but I can?t say much about it 
since I haven?t investigated it thoroughly.

To my understanding memcpy optimization is all about trade-offs according to 
use cases and this one is for DPDK scenario (Small size, in cache: you may find 
quite a few with only 6 bytes or so), you can refer to the rfc for this patch.
It?s not likely that one could make one that?re optimal for all scenarios.

But I agree with the author of glibc memcpy on this: A program with too many 
memcpys is a program with design flaw.


Thanks
Zhihong (John)

From: lukego at gmail.com [mailto:luk...@gmail.com] On Behalf Of Luke Gorrie
Sent: Monday, January 26, 2015 4:03 PM
To: Wang, Zhihong
Cc: dev at dpdk.org; snabb-devel at googlegroups.com
Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

On 26 January 2015 at 02:30, Wang, Zhihong mailto:zhihong.wang at intel.com>> wrote:
Hi Luke,

I?m very glad that you?re interested in this work. ?

Great :).

 I never published any performance data, and haven?t run cachebench.
We use test_memcpy_perf.c in DPDK to do the test mainly, because it?s the 
environment that DPDK runs. You can also find the performance comparison there 
with glibc.
It can be launched in /app/test: memcpy_perf_autotest.

Could you give me a command-line example to run this please? (Sorry if this 
should be obvious.)

 Finally, inline can bring benefits based on practice, constant value unrolling 
for example, and for DPDK we need all possible optimization.

Do we need to think about code size and potential instruction cache thrashing?

For me one call to rte_memcpy compiles to 3520 
instructions in 20KB of 
object code. That's more than half the size of the Haswell instruction cache 
(32KB) per call.

glibc 2.20's 
memcpy_avx_unaligned
 is only 909 bytes shared/total and also seems to have basically excellent 
performance on Haswell.

So I am concerned about the code size of rte_memcpy, especially when inlined, 
and meta-concerned about the nonlinear impact of nested inlined functions on 
both compile time and object code size.


There is another issue that I am concerned about:

The Intel Optimization Guide suggests that rep movs is very efficient starting 
in Ivy Bridge. In practice though it seems to be much slower than using vector 
instructions, even though it is faster than it used to be in Sandy Bridge. Is 
that true?

This could have a substantial impact on off-the-shelf memcpy. glibc 2.20's 
memcpy uses movs for sizes >= 2048 and that is where performance takes a dive 
for me (in microbenchmarks). GCC will also emit inline string move instructions 
for certain constant-size memcpy calls at certain optimization levels.


So I feel like I haven't yet found the right memcpy for me. and we haven't even 
started to look at the interesting parts like cache-coherence behaviour when 
sharing data between cores (vhost) and whether streaming load/store can be used 
to defend the state of cache lines between cores.


Do I make any sense? What do I miss?


Cheers,
-Luke

[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers

2015-01-27 Thread Xie, Huawei



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Tuesday, January 27, 2015 10:36 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers
> 
> The DPDK driver only has to deal with the case of running on PCI
> and with SMP. In this case, the code can use the weaker barriers
> instead of using hard (fence) barriers. This will help performance.
> The rationale is explained in Linux kernel virtio_ring.h.
> 
> To make it clearer that this is a virtio thing and not some generic
> barrier, prefix the barrier calls with virtio_.
> 
> Add missing (and needed) barrier between updating ring data
> structure and notifying host.
> 
> Signed-off-by: Stephen Hemminger 
> Signed-off-by: Changchun Ouyang 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |  2 +-
>  lib/librte_pmd_virtio/virtio_rxtx.c   |  8 +---
>  lib/librte_pmd_virtio/virtqueue.h | 19 ++-
>  3 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> b/lib/librte_pmd_virtio/virtio_ethdev.c
> index 662a49c..dc47e72 100644
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> @@ -175,7 +175,7 @@ virtio_send_command(struct virtqueue *vq, struct
> virtio_pmd_ctrl *ctrl,
>   uint32_t idx, desc_idx, used_idx;
>   struct vring_used_elem *uep;
> 
> - rmb();
> + virtio_rmb();
> 
>   used_idx = (uint32_t)(vq->vq_used_cons_idx
>   & (vq->vq_nentries - 1));
> diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c
> b/lib/librte_pmd_virtio/virtio_rxtx.c
> index c013f97..78af334 100644
> --- a/lib/librte_pmd_virtio/virtio_rxtx.c
> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c
> @@ -456,7 +456,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
> 
>   nb_used = VIRTQUEUE_NUSED(rxvq);
> 
> - rmb();
> + virtio_rmb();
> 
>   num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
>   num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num :
> VIRTIO_MBUF_BURST_SZ);
> @@ -516,6 +516,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>   }
> 
>   if (likely(nb_enqueued)) {
> + virtio_wmb();
>   if (unlikely(virtqueue_kick_prepare(rxvq))) {
>   virtqueue_notify(rxvq);
>   PMD_RX_LOG(DEBUG, "Notified\n");
> @@ -547,7 +548,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
> 
>   nb_used = VIRTQUEUE_NUSED(rxvq);
> 
> - rmb();
> + virtio_rmb();
> 
>   if (nb_used == 0)
>   return 0;
> @@ -694,7 +695,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>   PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
>   nb_used = VIRTQUEUE_NUSED(txvq);
> 
> - rmb();
> + virtio_rmb();
> 
>   num = (uint16_t)(likely(nb_used < VIRTIO_MBUF_BURST_SZ) ? nb_used :
> VIRTIO_MBUF_BURST_SZ);
> 
> @@ -735,6 +736,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>   }
>   }
>   vq_update_avail_idx(txvq);
> + virtio_wmb();
> 
>   txvq->packets += nb_tx;
> 
> diff --git a/lib/librte_pmd_virtio/virtqueue.h 
> b/lib/librte_pmd_virtio/virtqueue.h
> index fdee054..f6ad98d 100644
> --- a/lib/librte_pmd_virtio/virtqueue.h
> +++ b/lib/librte_pmd_virtio/virtqueue.h
> @@ -46,9 +46,18 @@
>  #include "virtio_ring.h"
>  #include "virtio_logs.h"
> 
> -#define mb()  rte_mb()
> -#define wmb() rte_wmb()
> -#define rmb() rte_rmb()
> +/*
> + * Per virtio_config.h in Linux.
> + * For virtio_pci on SMP, we don't need to order with respect to MMIO
> + * accesses through relaxed memory I/O windows, so smp_mb() et al are
> + * sufficient.
> + *
> + * This driver is for virtio_pci on SMP and therefore can assume
> + * weaker (compiler barriers)
> + */
> +#define virtio_mb()  rte_mb()
> +#define virtio_rmb() rte_compiler_barrier()
> +#define virtio_wmb() rte_compiler_barrier()
> 
>  #ifdef RTE_PMD_PACKET_PREFETCH
>  #define rte_packet_prefetch(p)  rte_prefetch1(p)
> @@ -225,7 +234,7 @@ virtqueue_full(const struct virtqueue *vq)
>  static inline void
>  vq_update_avail_idx(struct virtqueue *vq)
>  {
> - rte_compiler_barrier();
> + virtio_rmb();

I recall our original code is virtio_wmb(). 
Use store fence to ensure all updates to entries before updating the index.
Why do we need virtio_rmb() here and add virtio_wmb after vq_update_avail_idx()?

>   vq->vq_ring.avail->idx = vq->vq_avail_idx;
>  }
> 
> @@ -255,7 +264,7 @@ static inline void
>  virtqueue_notify(struct virtqueue *vq)
>  {
>   /*
> -  * Ensure updated avail->idx is visible to host. mb() necessary?
> +  * Ensure updated avail->idx is visible to host.
>* For virtio on IA, the notificaiton is through io port operation
>* which is

[dpdk-dev] [PATCH v4 00/11] Port Hotplug Framework

2015-01-27 Thread Qiu, Michael

On 1/27/2015 1:02 PM, Tetsuya Mukawa wrote:
> On 2015/01/27 12:00, Qiu, Michael wrote:
>> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
>>> This patch series adds a dynamic port hotplug framework to DPDK.
>>> With the patches, DPDK apps can attach or detach ports at runtime.
>>>
>>> The basic concept of the port hotplug is like followings.
>>> - DPDK apps must have responsibility to manage ports.
>>>   DPDK apps only know which ports are attached or detached at the moment.
>>>   The port hotplug framework is implemented to allow DPDK apps to manage 
>>> ports.
>>>   For example, when DPDK apps call port attach function, attached port 
>>> number
>>>   will be returned. Also DPDK apps can detach port by port number.
>>> - Kernel support is needed for attaching or detaching physical device ports.
>>>   To attach new device, the device will be recognized by kernel at first and
>>>   controlled by kernel driver. Then user can bind the device to igb_uio
>> Here does it really need native kernel driver here? As it will be
>> controlled by igb_uio.
>> I think even if the device has no kernel driver is also OK.
> Thanks for correcting. Yes, it should be.
> How about following.
>
> - Kernel support is needed for attaching or detaching physical device ports.
>   To attach a new device, the device will be recognized by kernel PCI
> hotplug feature at first.

No, here should not explain as "kernel PCI hotplug feature" which is
stand for removing or adding a PCI device from system level, it is
devices related not driver.

What about:

- Kernel support is needed for attaching or detaching physical device
ports. To attach a new physical device port, the device will be
recognized by userspace directly I/O framework in kernel at first.

Thanks,
Michael
>   Then user can bind the device to igb_uio.
>
>> Also I have finished initial patch of passthrough driver flag in
>> "struct rte_pci_device"
>>
>> I will send to you after I do some basic test on that, then I will send to
>> you, and you can give some comments on that.
> I appreciate for your implementing.
>
> Thanks,
> Tetsuya
>
>
>> Thanks,
>> Michael
>>
>>>   by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
>>>   functions to attach ports.
>>>   For detaching, steps are vice versa.
>>> - Before detach ports, ports must be stopped and closed.
>>>   DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() 
>>> before
>>>   detaching ports. These function will call finalization codes of PMDs.
>>>   But so far, no PMD frees all resources allocated by initialization.
>>>   It means PMDs are needed to be fixed to support the port hotplug.
>>>   'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports 
>>> detaching.
>>>   Without this flag, detaching will be failed.
>>> - Mustn't affect legacy DPDK apps.
>>>   No DPDK EAL behavior is changed, if the port hotplug functions are't 
>>> called.
>>>   So all legacy DPDK apps can still work without modifications.
>>>
>>> And a few limitations.
>>> - The port hotplug functions are not thread safe.
>>>   DPDK apps should handle it.
>>> - Only support Linux and igb_uio so far.
>>>   BSD and VFIO is not supported. I will send VFIO patches at least, but I 
>>> don't
>>>   have a plan to submit BSD patch so far.
>>>
>>>
>>> Here is port hotplug APIs.
>>> ---
>>> /**
>>>  * Attach a new device.
>>>  *
>>>  * @param devargs
>>>  *   A pointer to a strings array describing the new device
>>>  *   to be attached. The strings should be a pci address like
>>>  *   ':01:00.0' or virtual device name like 'eth_pcap0'.
>>>  * @param port_id
>>>  *  A pointer to a port identifier actually attached.
>>>  * @return
>>>  *  0 on success and port_id is filled, negative on error
>>>  */
>>> int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);
>>>
>>> /**
>>>  * Detach a device.
>>>  *
>>>  * @param port_id
>>>  *   The port identifier of the device to detach.
>>>  * @param addr
>>>  *  A pointer to a device name actually detached.
>>>  * @return
>>>  *  0 on success and devname is filled, negative on error
>>>  */
>>> int rte_eal_dev_detach(uint8_t port_id, char *devname);
>>> ---
>>>
>>> This patch series are for DPDK EAL. To use port hotplug function by DPDK 
>>> apps,
>>> each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please 
>>> check
>>> a patch for pcap PMD.
>>>
>>> Also please check testpmd patch. It will show you how to fix your legacy
>>> applications to support port hotplug feature.
>>>
>>>
>>> PATCH v4 changes
>>>  - Merge patches to review easier.
>>>  - Fix indent of 'if' statement.
>>>  - Fix calculation method of eal_compare_pci_addr().
>>>  - Fix header file declaration.
>>>  - Add header file to determine if hotplug can be enabled.
>>>(Thanks to Qiu, Michael)
>>>  - Use braces with 'for' loop.
>>>  - Add

[dpdk-dev] [PATCH 0/6] Support NVGRE on i40e

2015-01-27 Thread Cao, Min

Test by: min.cao 
Patch name: [dpdk-dev] [PATCH 0/6]  Support NVGRE on i40e
Test Flag:  Tested-by
Tester name:min.cao at intel.com
Result summary: total 2 cases, 2 passed, 0 failed

Test Case 1:
Name:   nvgre filter
Environment:OS: Fedora20 3.11.10-301.fc20.x86_64
gcc (GCC) 4.8.2
CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NIC: Fortville eagle 
Test result:PASSED
Detail: check normal packet + ip filter 
check vxlan packet + inner ip filter
check vxlan packet + outer ip filter
check vxlan packet + outer ip + inner ip filter
check vxlan packet + inner udp filter
check vxlan packet + inner tcp filter
check vlan vxlan packet + outer ip filter
check vlan vxlan packet + inner ip filter
check vlan vxlan packet + outer ip filter
check vlan vxlan packet + inner vlan + outer ip filter
check vlan vxlan packet + inner vlan + inner ip filter
check vlan vxlan packet + inner vlan + outer ip 
filter
check vlan vxlan packet + inner vlan + inner udp filter
check vlan vxlan packet + inner vlan + inner tcp filter

Test Case 2:
Name:   nvgre checksum
Environment:OS: Fedora20 3.11.10-301.fc20.x86_64
gcc (GCC) 4.8.2
CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NIC: Fortville eagle 
Test result:PASSED
Detail: check normal packet + ip checksum invalid
check vxlan packet + inner ip checksum invalid
check vxlan packet + outer ip checksum invalid
check vxlan packet + outer ip + inner ip checksum 
invalid
check vxlan packet + inner udp checksum invalid
check vxlan packet + inner tcp checksum invalid
check vlan vxlan packet + outer ip checksum invalid
check vlan vxlan packet + inner ip checksum invalid
check vlan vxlan packet + outer ip checksum 
invalid
check vlan vxlan packet + inner vlan + outer ip 
checksum invalid
check vlan vxlan packet + inner vlan + inner ip 
checksum invalid
check vlan vxlan packet + inner vlan + outer ip 
checksum invalid
check vlan vxlan packet + inner vlan + inner udp 
checksum invalid
check vlan vxlan packet + inner vlan + inner tcp 
checksum invalid

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jijiang Liu
Sent: Monday, January 26, 2015 11:43 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH 0/6] Support NVGRE on i40e

The patch set supports NVGRE on i40e.

It includes:
 - Support RX filters for NVGRE packet. It uses MAC and VLAN to point
   to a queue. The filter types supported are listed below:

   1. Inner MAC and Inner VLAN ID

   2. Inner MAC address, inner VLAN ID and tenant ID.

   3. Inner MAC and tenant ID

   4. Inner MAC address

   5. Outer MAC address, tenant ID and inner MAC

   6. Inner IP

 - Support TX checksum offload for NVGRE packet, which include outer L3(IP), 
inner L3(IP) and inner L4(UDP, TCP and SCTP)

Jijiang Liu (6):
  add gre header defination
  add nvgre RX filter in i40e
  test nvgre RX filters
  add GRE packet offload flag 
  support GRE packet TX checksum offload
  test nvgre TX checksum offload

 app/test-pmd/cmdline.c|   37 -
 app/test-pmd/csumonly.c   |  105 +++--
 app/test-pmd/testpmd.h|4 +-
 lib/librte_ether/rte_ether.h  |   12 
 lib/librte_mbuf/rte_mbuf.h|6 ++
 lib/librte_pmd_i40e/i40e_ethdev.c |6 ++
 lib/librte_pmd_i40e/i40e_rxtx.c   |   15 -
 7 files changed, 139 insertions(+), 46 deletions(-)

-- 
1.7.7.6

[dpdk-dev] [PATCH 5/7] ethdev: unification of flow types

2015-01-27 Thread Zhang, Helin

Hi Vithal

Exactly! Some types of NIC (e.g. i40e) support it.

Regards,
Helin

> -Original Message-
> From: Vithal S Mohare [mailto:vmohare at arubanetworks.com]
> Sent: Tuesday, January 27, 2015 12:31 PM
> To: Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH 5/7] ethdev: unification of flow types
> 
> Hi Helin,
> 
> I see a new type *_L2_PAYLOAD added for RSS types.  Is this for spraying of
> pure L2 packets (non-ip)?
> 
> Thanks,
> -Vithal
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Helin Zhang
> Sent: Monday, January 19, 2015 12:26 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 5/7] ethdev: unification of flow types
> 
> Flow types was defined actually for i40e hardware specifically, and wasn't 
> able
> to be used for defining RSS offload types of all PMDs. It removed the enum 
> flow
> types, and uses macros instead with new names. The new macros can be used
> for defining RSS offload types later. Also modifications are made in i40e and
> testpmd accordingly.
> 
> Signed-off-by: Helin Zhang 
> ---
>  app/test-pmd/cmdline.c| 88
> ---
>  app/test-pmd/config.c | 71 +--
>  lib/librte_ether/rte_eth_ctrl.h   | 55 ++--
>  lib/librte_pmd_i40e/i40e_ethdev.c | 68 +-
> lib/librte_pmd_i40e/i40e_ethdev.h | 34 +++
>  lib/librte_pmd_i40e/i40e_fdir.c   | 84 ++---
>  6 files changed, 235 insertions(+), 165 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4618b92..80b9c32 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -707,7 +707,7 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>   "get info of a flex filter.\n\n"
> 
>   "flow_director_filter (port_id) (add|del)"
> - " flow (ip4|ip4-frag|ip6|ip6-frag)"
> + " flow (ipv4-other|ipv4-frag|ipv6-other|ipv6-frag)"
>   " src (src_ip_address) dst (dst_ip_address)"
>   " flexbytes (flexbytes_value)"
>   " (drop|fwd) queue (queue_id) fd_id (fd_id_value)\n"
> @@ -733,7 +733,8 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>   "Flush all flow director entries of a device.\n\n"
> 
>   "flow_director_flex_mask (port_id)"
> - " flow
> (ip4|ip4-frag|tcp4|udp4|sctp4|ip6|ip6-frag|tcp6|udp6|sctp6|all)"
> + " flow 
> (ipv4-other|ipv4-frag|ipv4-tcp|ipv4-udp|ipv4-sctp|"
> + "ipv6-other|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|all)"
>   " (mask)\n"
>   "Configure mask of flex payload.\n\n"
> 
> @@ -8158,31 +8159,34 @@ parse_flexbytes(const char *q_arg, uint8_t
> *flexbytes, uint16_t max_num)
>   return ret;
>  }
> 
> -static enum rte_eth_flow_type
> +static uint16_t
>  str2flowtype(char *string)
>  {
>   uint8_t i = 0;
>   static const struct {
>   char str[32];
> - enum rte_eth_flow_type type;
> + uint16_t type;
>   } flowtype_str[] = {
> - {"ip4", RTE_ETH_FLOW_TYPE_IPV4_OTHER},
> - {"ip4-frag", RTE_ETH_FLOW_TYPE_FRAG_IPV4},
> - {"udp4", RTE_ETH_FLOW_TYPE_UDPV4},
> - {"tcp4", RTE_ETH_FLOW_TYPE_TCPV4},
> - {"sctp4", RTE_ETH_FLOW_TYPE_SCTPV4},
> - {"ip6", RTE_ETH_FLOW_TYPE_IPV6_OTHER},
> - {"ip6-frag", RTE_ETH_FLOW_TYPE_FRAG_IPV6},
> - {"udp6", RTE_ETH_FLOW_TYPE_UDPV6},
> - {"tcp6", RTE_ETH_FLOW_TYPE_TCPV6},
> - {"sctp6", RTE_ETH_FLOW_TYPE_TCPV6},
> + {"ipv4", ETH_FLOW_TYPE_IPV4},
> + {"ipv4-frag", ETH_FLOW_TYPE_FRAG_IPV4},
> + {"ipv4-tcp", ETH_FLOW_TYPE_NONFRAG_IPV4_TCP},
> + {"ipv4-udp", ETH_FLOW_TYPE_NONFRAG_IPV4_UDP},
> + {"ipv4-sctp", ETH_FLOW_TYPE_NONFRAG_IPV4_SCTP},
> + {"ipv4-other", ETH_FLOW_TYPE_NONFRAG_IPV4_OTHER},
> + {"ipv6", ETH_FLOW_TYPE_IPV6},
> + {"ipv6-frag", ETH_FLOW_TYPE_FRAG_IPV6},
> + {"ipv6-tcp", ETH_FLOW_TYPE_NONFRAG_IPV6_TCP},
> + {"ipv6-udp", ETH_FLOW_TYPE_NONFRAG_IPV6_UDP},
> + {"ipv6-sctp", ETH_FLOW_TYPE_NONFRAG_IPV6_SCTP},
> + {"ipv6-other", ETH_FLOW_TYPE_NONFRAG_IPV6_OTHER},
> + {"l2_payload", ETH_FLOW_TYPE_L2_PAYLOAD},
>   };
> 
>   for (i = 0; i < RTE_DIM(flowtype_str); i++) {
>   if (!strcmp(flowtype_str[i].str, string))
>   return flowtype_str[i].type;
>   }
> - return RTE_ETH_FLOW_TYPE_NONE;
> + return ETH_FLOW_TYPE_UNKNOWN;
>  }
> 
>  #define IPV4_ADDR_TO_UINT(ip_addr, ip) \ @@ -8235,9 +8239,9 @@
> cmd_flow_director_filter_parsed(void *parsed_result,
> 
>

[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-27 Thread Wang, Zhihong



> -Original Message-
> From: Wodkowski, PawelX
> Sent: Monday, January 26, 2015 10:43 PM
> To: Wang, Zhihong; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in
> arch/x86/rte_memcpy.h for both SSE and AVX platforms
> 
> Hi,
> 
> I must say: greate work.
> 
> I have some small comments:
> 
> > +/**
> > + * Macro for copying unaligned block from one location to another,
> > + * 47 bytes leftover maximum,
> > + * locations should not overlap.
> > + * Requirements:
> > + * - Store is aligned
> > + * - Load offset is , which must be immediate value within [1, 15]
> > + * - For , make sure  bit backwards & <16 - offset> bit
> forwards
> > are available for loading
> > + * - , ,  must be variables
> > + * - __m128i  ~  must be pre-defined
> > + */
> > +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> > \
> > +{  
> >  \
> ...
> > +}
> 
> Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable side
> effects.
> 
> Second:
> Why you completely substitute
> #define rte_memcpy(dst, src, n)  \
>   ({ (__builtin_constant_p(n)) ?   \
>   memcpy((dst), (src), (n)) :  \
>   rte_memcpy_func((dst), (src), (n)); })
> 
> with inline rte_memcpy()? This construction  can help compiler to deduce
> which version to use (static?) inline implementation or call external
> function.
> 
> Did you try 'extern inline' type? It could help reducing compilation time.

Hi Pawel,

Good call on "MOVEUNALIGNED_LEFT47". Thanks!

I removed the conditional __builtin_constant_p(n) because it calls glibc memcpy 
when the parameter is constant, while rte_memcpy has better performance there.
Current long compile time is caused by too many function calls, I'll fix that 
in the next version.

Zhihong (John)

[dpdk-dev] [PATCH v1 09/15] malloc: fix the issue of SOCKET_ID_ANY

2015-01-27 Thread Liang, Cunming



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Sunday, January 25, 2015 4:05 PM
> To: Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 09/15] malloc: fix the issue of 
> SOCKET_ID_ANY
> 
> On Thu, 22 Jan 2015 16:16:32 +0800
> Cunming Liang  wrote:
> 
> > -   return rte_socket_id();
> > +   unsigned socket_id = rte_socket_id();
> > +
> > +   if (socket_id == (unsigned)SOCKET_ID_ANY)
> 
> I prefer not casting -1 to unsigned it will cause warnings.
> It is better to make socket_id an integer and then have
> the implicit cast in the return.
[Liang, Cunming] I didn't got warning about it, in which version of compiler 
complain it ?

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Wiles, Keith



On 1/26/15, 8:06 PM, "Matthew Hall"  wrote:

>On Tue, Jan 27, 2015 at 10:35:40AM +0800, Ouyang Changchun wrote:
>> This is the patch set for single virtio implementation.
>>  
>> Why we need single virtio?
>> 
>> As we know currently there are at least 3 virtio PMD driver
>>implementations:
>> A) lib/librte_pmd_virtio(refer as virtio A);
>> B) virtio_net_pmd by 6wind(refer as virtio B);
>> C) virtio by Brocade/vyatta(refer as virtio C);
>>  
>> Integrating 3 implementations into one could reduce the maintaining
>>cost and time,
>> in other hand, user don't need practice their application on 3 variant
>>one by one to see
>> which one is the best for them;
>
>Thank you so much for this, using virtio drivers in DPDK has been messy
>and 
>unpleasant in the past, and you clearly wrote a lot of nice new code to
>help 
>improve it all.
>
>Previously I'd reported a bug, where all RTE virtio drivers I tried (A
>and B, 
>because I did not know C existed), failed to work with the virtio-net
>interfaces exposed in VirtualBox, due to various strange errors, and they
>all 
>only worked with the virtio-net interfaces from qemu.
>
>I wanted to find out if we managed to fix this other problem, because I
>would 
>really like to use the Vagrant VM deployment tool
>(https://www.vagrantup.com/)
>to distribute my open-source DPDK based application to everyone in the
>open source community.
>
>The better the out-of-box experience of practical community-created
>DPDK-based 
>real-life example applications similar to mine, the more adoption of DPDK
>and 
>better DPDK community we will be able to have as time marches forward.
>
>If we could manage to get it to work in VirtualBox, then I could surely
>help 
>do some app-level testing on the new code, if we could see it in a test
>branch 
>or test repo somewhere I could access it.

There is an app note on how to get DPDK working in VirtualBox, it is a bit
bumpy on getting it work.
Here is the link: 
http://plvision.eu/blog/deploying-intel-dpdk-in-oracle-virtualbox/

I have not tried it, but it was suggested to me it should work. It will be
nice if the new driver works better :-)
>
>Sincerely,
>Matthew Hall

[dpdk-dev] [PATCH v4 00/11] Port Hotplug Framework

2015-01-27 Thread Qiu, Michael

On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
> This patch series adds a dynamic port hotplug framework to DPDK.
> With the patches, DPDK apps can attach or detach ports at runtime.
>
> The basic concept of the port hotplug is like followings.
> - DPDK apps must have responsibility to manage ports.
>   DPDK apps only know which ports are attached or detached at the moment.
>   The port hotplug framework is implemented to allow DPDK apps to manage 
> ports.
>   For example, when DPDK apps call port attach function, attached port number
>   will be returned. Also DPDK apps can detach port by port number.
> - Kernel support is needed for attaching or detaching physical device ports.
>   To attach new device, the device will be recognized by kernel at first and
>   controlled by kernel driver. Then user can bind the device to igb_uio

Here does it really need native kernel driver here? As it will be
controlled by igb_uio.
I think even if the device has no kernel driver is also OK.

Also I have finished initial patch of passthrough driver flag in
"struct rte_pci_device"

I will send to you after I do some basic test on that, then I will send to
you, and you can give some comments on that.


Thanks,
Michael

>   by 'dpdk_nic_bind.py'. Finally, DPDK apps can call the port hotplug
>   functions to attach ports.
>   For detaching, steps are vice versa.
> - Before detach ports, ports must be stopped and closed.
>   DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
>   detaching ports. These function will call finalization codes of PMDs.
>   But so far, no PMD frees all resources allocated by initialization.
>   It means PMDs are needed to be fixed to support the port hotplug.
>   'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
>   Without this flag, detaching will be failed.
> - Mustn't affect legacy DPDK apps.
>   No DPDK EAL behavior is changed, if the port hotplug functions are't called.
>   So all legacy DPDK apps can still work without modifications.
>
> And a few limitations.
> - The port hotplug functions are not thread safe.
>   DPDK apps should handle it.
> - Only support Linux and igb_uio so far.
>   BSD and VFIO is not supported. I will send VFIO patches at least, but I 
> don't
>   have a plan to submit BSD patch so far.
>
>
> Here is port hotplug APIs.
> ---
> /**
>  * Attach a new device.
>  *
>  * @param devargs
>  *   A pointer to a strings array describing the new device
>  *   to be attached. The strings should be a pci address like
>  *   ':01:00.0' or virtual device name like 'eth_pcap0'.
>  * @param port_id
>  *  A pointer to a port identifier actually attached.
>  * @return
>  *  0 on success and port_id is filled, negative on error
>  */
> int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);
>
> /**
>  * Detach a device.
>  *
>  * @param port_id
>  *   The port identifier of the device to detach.
>  * @param addr
>  *  A pointer to a device name actually detached.
>  * @return
>  *  0 on success and devname is filled, negative on error
>  */
> int rte_eal_dev_detach(uint8_t port_id, char *devname);
> ---
>
> This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
> each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please 
> check
> a patch for pcap PMD.
>
> Also please check testpmd patch. It will show you how to fix your legacy
> applications to support port hotplug feature.
>
>
> PATCH v4 changes
>  - Merge patches to review easier.
>  - Fix indent of 'if' statement.
>  - Fix calculation method of eal_compare_pci_addr().
>  - Fix header file declaration.
>  - Add header file to determine if hotplug can be enabled.
>(Thanks to Qiu, Michael)
>  - Use braces with 'for' loop.
>  - Add paramerter checking.
>  - Fix sanity check code
>  - Fix comments of rte_eth_dev_type.
>  - Change function names.
>(Thanks to Iremonger, Bernard)
>
> PATCH v3 changes:
>  - Fix enum definition used in rte_ethdev.c.
>(Thanks to Zhang, Helin)
>
> PATCH v2 changes:
>  - Replace rte_eal_dev_attach_pdev(), rte_eal_dev_detach_pdev,
>rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev() to
>rte_eal_dev_attach() and rte_eal_dev_detach().
>  - Add parameter values checking.
>  - Refashion a few functions.
>(Thanks to Iremonger, Bernard)
>
> PATCH v1 Changes:
>  - Fix error checking code of librte_eth APIs.
>  - Fix issue that port from pcap PMD cannot be detached correctly.
>  - Fix issue that testpmd could hang after forwarding, if attaching and 
> detaching
>is repeatedly.
>  - Fix if-condition of rte_eth_dev_get_port_by_addr().
>(Thanks to Mark Enright)
>
> RFC PATCH v2 Changes:
> - remove 'rte_eth_dev_validate_port()', and cleanup codes.
>
>
> Tetsuya Mukawa (11):
>   eal/pci,ethdev: Remove assumption that port will not be

[dpdk-dev] DPDK testpmd forwarding performace degradation

2015-01-27 Thread 吴亚东

65 bytes frame may degrade performace a lot.Thats related to DMA and cache.
When NIC dma packets to memory, NIC has to do read modify write if DMA size is 
partial cache line.So for 65 bytes, the first 64 bytes are ok. The next 1 byte 
NIC has to read the whole cache line, change one byte and update the cache line.
So in DPDK, CRC is not stripped and ethernet header aligned to cache line which 
causes ip header not aligned on 4 bytes.

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-27 Thread Wang, Zhihong



> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, January 27, 2015 2:29 AM
> To: Wang, Zhihong; Richardson, Bruce; Marc Sune
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> Hi Zhihong,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> > Sent: Friday, January 23, 2015 6:52 AM
> > To: Richardson, Bruce; Marc Sune
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > Richardson
> > > Sent: Wednesday, January 21, 2015 9:26 PM
> > > To: Marc Sune
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > >
> > > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> > > >
> > > > On 21/01/15 14:02, Bruce Richardson wrote:
> > > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> > > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> > > > -Original Message-
> > > > From: Richardson, Bruce
> > > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > > To: Neil Horman
> > > > Cc: Wang, Zhihong; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > 
> > > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > > >>>-Original Message-
> > > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > >>>Sent: Monday, January 19, 2015 9:02 PM
> > > > >>>To: Wang, Zhihong
> > > > >>>Cc: dev at dpdk.org
> > > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy
> > > > >>>optimization
> > > > >>>
> > > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
> > > > >>>zhihong.wang at intel.com
> > > > wrote:
> > > > This patch set optimizes memcpy for DPDK for both SSE and
> > > > AVX
> > > > platforms.
> > > > It also extends memcpy test coverage with unaligned cases
> > > > and more test
> > > > >>>points.
> > > > Optimization techniques are summarized below:
> > > > 
> > > > 1. Utilize full cache bandwidth
> > > > 
> > > > 2. Enforce aligned stores
> > > > 
> > > > 3. Apply load address alignment based on architecture
> > > > features
> > > > 
> > > > 4. Make load/store address available as early as possible
> > > > 
> > > > 5. General optimization techniques like inlining, branch
> > > > reducing, prefetch pattern access
> > > > 
> > > > Zhihong Wang (4):
> > > >    Disabled VTA for memcpy test in app/test/Makefile
> > > >    Removed unnecessary test cases in test_memcpy.c
> > > >    Extended test coverage in test_memcpy_perf.c
> > > >    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
> > > and AVX
> > > >  platforms
> > > > 
> > > >   app/test/Makefile  |   6 +
> > > >   app/test/test_memcpy.c |  52 +-
> > > >   app/test/test_memcpy_perf.c| 238 
> > > >  +---
> > > >   .../common/include/arch/x86/rte_memcpy.h   | 664
> > > > >>>+++--
> > > >   4 files changed, 656 insertions(+), 304 deletions(-)
> > > > 
> > > > --
> > > > 1.9.3
> > > > 
> > > > 
> > > > >>>Are you able to compile this with gcc 4.9.2?  The
> > > > >>>compilation of test_memcpy_perf is taking forever for me.  It
> appears hung.
> > > > >>>Neil
> > > > >>Neil,
> > > > >>
> > > > >>Thanks for reporting this!
> > > > >>It should compile but will take quite some time if the CPU
> > > > >>doesn't support
> > > > AVX2, the reason is that:
> > > > >>1. The SSE & AVX memcpy implementation is more complicated
> > > than
> > > > AVX2
> > > > >>version thus the compiler takes more time to compile and
> > > > >>optimize
> > > 2.
> > > > >>The new test_memcpy_perf.c contains 126 constants memcpy
> > > > >>calls for better test case coverage, that's quite a lot
> > > > >>
> > > > >>I've just tested this patch on an Ivy Bridge machine with GCC
> 4.9.2:
> > > > >>1. The whole compile process takes 9'41" with the original
> > > > >>test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2.
> > > > >>It takes only 2'41" after I reduce the constant memcpy call
> > > > >>number to 12 + 12 = 24
> > > > >>
> > > > >>I'll reduce memcpy call in the next version of patch.
> > > > >>
> > > > >ok, thank you.  I'm all for optimzation, but I think a
> > > > >compile that takes almost
> > > > >10 minutes for a single

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Matthew Hall

On Tue, Jan 27, 2015 at 03:42:00AM +, Wiles, Keith wrote:
> There is an app note on how to get DPDK working in VirtualBox, it is a bit
> bumpy on getting it work.
> Here is the link: 
> http://plvision.eu/blog/deploying-intel-dpdk-in-oracle-virtualbox/
> 
> I have not tried it, but it was suggested to me it should work. It will be
> nice if the new driver works better :-)

I already used a derivative of these directions... "cheated" and used the igb 
driver like they did. Unlike them I automated the entire process, including 
updating the base OS to latest kernel and recompiling against it, as well as 
auto-enabling the NICs, the SSE instruction sets, etc. etc.

However their directions use an IGB NIC not a virtio-net NIC which would be 
much better for performance and resource consumption. So I really would be 
very very happy if we had a virtio-net which worked properly with both qemu 
and VirtualBox.

Matthew.

90 matches

Mail list logo