date:20150216

[dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock

2015-02-16 Thread Dumitrescu, Cristian

Hi Stephen,

Sorry, NACK.

1. Overflow issue
As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the 
line of code below results in overflow:
port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / 
params->rate;
Therefore, there is most likely a significant accuracy loss, which might result 
in more packets allowed to go out than it should.

2. Integer division has a higher cost than floating point division
My understanding is we are considering a performance improvement by replacing 
the double precision floating point division in:
double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
with an integer division:
uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / 
port->cycles_per_byte;
I don't think this is going to have the claimed benefit, as acording to "Intel 
64 and IA-32 Architectures Optimization  Reference Manual" (Appendix C), the 
latency of the integer division instruction is significantly bigger than the 
latency of integer division:
Instruction FDIV double precision: latency = 38-40 cycles
Instruction IDIV: latency = 56 - 80 cycles

3. Alternative
I hear though your suggestion about replacing the floating point division with 
a more performant construction. One suggestion would be to replace it with an 
integer multiplication followed by a shift right, probably by using a uint64_t 
bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to 
prototype this code myself. Would you be OK to look into providing an 
alternative implementation?

Thanks,
Cristian


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Thursday, February 5, 2015 6:14 AM
To: dev at dpdk.org
Cc: Stephen Hemminger
Subject: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in 
calculating byte clock

From: Stephen Hemminger 

The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled math instead.
This improved performance from 5Gbit/sec to 10 Gbit/sec

Signed-off-by: Stephen Hemminger 
---
 lib/librte_sched/rte_sched.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 55fbc14..3023457 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -102,6 +102,9 @@

 #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX

+/* For cycles_per_byte calculation */
+#define RTE_SCHED_TIME_SHIFT 20
+
 struct rte_sched_subport {
/* Token bucket (TB) */
uint64_t tb_time; /* time of last update */
@@ -239,7 +242,7 @@ struct rte_sched_port {
uint64_t time_cpu_cycles; /* Current CPU time measured in CPU cyles 
*/
uint64_t time_cpu_bytes;  /* Current CPU time measured in bytes */
uint64_t time;/* Current NIC TX time measured in bytes 
*/
-   double cycles_per_byte;   /* CPU cycles per byte */
+   uint32_t cycles_per_byte;   /* CPU cycles per byte (scaled) */

/* Scheduling loop detection */
uint32_t pipe_loop;
@@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params)
port->time_cpu_cycles = rte_get_tsc_cycles();
port->time_cpu_bytes = 0;
port->time = 0;
-   port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) 
params->rate);
+
+   port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT)
+   / params->rate;

/* Scheduling loop detection */
port->pipe_loop = RTE_SCHED_PIPE_INVALID;
@@ -2156,11 +2161,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port)
 {
uint64_t cycles = rte_get_tsc_cycles();
uint64_t cycles_diff = cycles - port->time_cpu_cycles;
-   double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
+   uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
+   / port->cycles_per_byte;

/* Advance port time */
port->time_cpu_cycles = cycles;
-   port->time_cpu_bytes += (uint64_t) bytes_diff;
+   port->time_cpu_bytes += bytes_diff;
if (port->time < port->time_cpu_bytes) {
port->time = port->time_cpu_bytes;
}
-- 
2.1.4

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH v3 00/20] enhance tx checksum offload API

2015-02-16 Thread Thomas Monjalon

> > The goal of this series is to clarify and simplify the mbuf offload API.
> > 
> > - simplify the definitions of PKT_TX_IP_CKSUM and PKT_TX_IPV4, each
> >   flag has now only one meaning. No impact on the code.
> > 
> > - add a feature flag for OUTER_IP_CHECKSUM (from Jijiang's patches)
> > 
> > - remove the PKT_TX_UDP_TUNNEL_PKT flag: it is useless from an API point
> >   of view. It was added because i40e need this info for some reason. We
> >   have 3 solutions:
> > 
> >   - remove the flag and adapt the driver to the API (the choice I made
> > for this series).
> > 
> >   - remove the flag and stop advertising OUTER_IP_CHECKSUM in i40e
> > 
> >   - keep this flag, penalizing performance of drivers that do not
> > require the flag. It would also mean that drivers won't support
> > outer IP checksum for all tunnel types, but only for the tunnel
> > types having a flag.
> > 
> > - a side effect of this API clarification is that there is only one
> >   way for doing one operation. If the hardware has several ways to
> >   do the same operation, a choice has to be made in the driver.
> > 
> > The series also provide some enhancements and fixes related to this API 
> > rework:
> > 
> > - new tunnel types to testpmd csum forward engine.
> > - fixes in i40e to adapt to new api and support more tunnel types.
> > 
> > [1] http://dpdk.org/ml/archives/dev/2015-January/011127.html
> > 
> > Changes in v2:
> > - fix test of rx offload flag in parse_vlan() pointed out by Jijiang
> > 
> > Changes in v3:
> > - more detailed API comments for PKT_TX_IPV4 and PKT_TX_IPV6
> > - do not calculate the outer UDP checksum if packet is not UDP
> > - add a likely() in i40e
> > - remove a unlikely() in i40e
> > - fix a patch split issue
> > - rebase on head
> > 
> > Jijiang Liu (2):
> >   ethdev: add outer IP offload capability flag
> >   i40e: advertise outer IPv4 checksum capability
> > 
> > Olivier Matz (18):
> >   mbuf: remove PKT_TX_IPV4_CSUM
> >   mbuf: enhance the API documentation of offload flags
> >   i40e: call i40e_txd_enable_checksum only for offloaded packets
> >   i40e: remove the use of PKT_TX_UDP_TUNNEL_PKT flag
> >   mbuf: remove PKT_TX_UDP_TUNNEL_PKT flag
> >   testpmd: replace tx_checksum command by csum
> >   testpmd: move csum_show in a function
> >   testpmd: add csum parse_tunnel command
> >   testpmd: rename vxlan in outer_ip in csum commands
> >   testpmd: introduce parse_ipv* in csum fwd engine
> >   testpmd: use a structure to store offload info in csum fwd engine
> >   testpmd: introduce parse_vxlan in csum fwd engine
> >   testpmd: support gre tunnels in csum fwd engine
> >   testpmd: support ipip tunnel in csum forward engine
> >   testpmd: add a warning if outer ip cksum requested but not supported
> >   testpmd: fix TSO when using outer checksum offloads
> >   i40e: fix offloading of outer checksum for ip in ip tunnels
> >   i40e: add debug logs for tx context descriptors
> 
> Acked-by:  Jijiang Liu < Jijiang.liu at intel.com>

Applied, thanks for making API clearer

[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-16 Thread Thomas Monjalon

2015-02-16 15:16, Bruce Richardson:
> On Mon, Feb 16, 2015 at 03:33:40PM +0100, Olivier MATZ wrote:
> > Hi John,
> > 
> > On 02/13/2015 04:39 PM, John McNamara wrote:
> > > From: Richardson, Bruce 
> > > 
> > > Example showing how callbacks can be used to insert a timestamp
> > > into each packet on RX. On TX the timestamp is used to calculate
> > > the packet latency through the app, in cycles.
> > > 
> > > Signed-off-by: Bruce Richardson 
> > 
> > 
> > I'm looking at the example and I don't understand what is the advantage
> > of having callbacks in ethdev layer, knowing that the application can
> > do the same job by a standard function call.
> > 
> > What is the advantage of having callbacks compared to:
> > 
> > 
> > for (port = 0; port < nb_ports; port++) {
> > struct rte_mbuf *bufs[BURST_SIZE];
> > const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
> > bufs, BURST_SIZE);
> > if (unlikely(nb_rx == 0))
> > continue;
> > add_timestamp(bufs, nb_rx);
> > 
> > const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
> > bufs, nb_rx);
> > calc_latency(bufs, nb_tx);
> > 
> > if (unlikely(nb_tx < nb_rx)) {
> > uint16_t buf;
> > for (buf = nb_tx; buf < nb_rx; buf++)
> > rte_pktmbuf_free(bufs[buf]);
> > }
> > }
> > 
> > 
> > To me, doing like the code above has several advantages:
> > 
> > - code is more readable: the callback is explicitly invoked, so there is
> >   no risk to forget it
> > - code is faster: the functions calls can be inlined by the compiler
> > - easier to handle error cases in the callback function as the error
> >   code is accessible to the application
> > - there is no need to add code in ethdev api to do this
> > - if the application does not want to use callbacks (I suppose most
> >   applications), it won't have any performance impact
> > 
> > Regards,
> > Olivier
> 
> In this specific instance, given that the application does little else, there
> is no real advantage to using the callbacks - it's just to have a simple 
> example
> of how they can be used.
> 
> Where callbacks are really designed to be useful, is for extending or 
> augmenting
> hardware capabilities. Taking the example of sequence numbers - to use the 
> most
> trivial example - an application could be written to take advantage of 
> sequence
> numbers written to packets by the hardware which received them. However, if 
> such
> an application was to be used with a NIC which does not provide sequence 
> numbering
> capability, for example, anything using ixgbe driver, the application writer 
> has
> two choices - either modify his application code to check each packet for
> a sequence number in the data path, and add it there post-rx, or 
> alternatively,
> to check the NIC capabilities at initialization time, and add a callback there
> at initialization, if the hardware does not support it. In the latter case,
> the main packet processing body of the application can be written as though
> hardware always has sequence numbering capability, safe in the knowledge that
> any hardware not supporting it will be back-filled by a software fallback at 
> initialization-time.
> 
> By the same token, we could also look to extend hardware capabilities. For
> different filtering or hashing capabilities, there can be limits in hardware
> which are far less than what we need to use in software. Again, callbacks will
> allow the data path to be written in a way that is oblivious to the underlying
> hardware limits, because software will transparently fill in the gaps.
> 
> Hope this makes the use case clear.

After thinking more about these callbacks, I realize these callbacks won't
help, as Olivier said.

With callback,
1/ application checks device capability
2/ application provides hardware emulation as DPDK callback
3/ application forgets previous steps
4/ application calls DPDK Rx
5/ DPDK calls callback (without calling optimization)

Without callback,
1/ application checks device capability
2/ application provides hardware emulation as internal function
3/ application set an internal device-flag to enable this function
4/ application calls DPDK Rx
5/ application calls the hardware emulation if flag is set

So the only difference is to keep persistent the device information in
the application instead of storing it as a function pointer in the
DPDK struct.
You can also be faster with this approach: at initialization time,
you can check that your NIC supports the feature and use a specific
mainloop that adds or not the sequence number without any runtime
test.

A callback could be justified for asynchronous events, or when
doing specific processing in the middle of the driver, for instance
when freeing a mbuf. But in this case it's exactly similar to do
the processing in the application after Rx (or before Tx).

[dpdk-dev] [PULL REQUEST] fm10k: new polling mode driver for PF/VF.

2015-02-16 Thread Chen Jing D(Mark)

These changes add poll mode driver for the host interface of Intel
Ethernet Switch FM1 Series of silicons, which integrate NIC and
switch functionalities. The patch set include below features:

1. Basic RX/TX functions for PF/VF.
2. Interrupt handling mechanism for PF/VF.
3. per queue start/stop functions for PF/VF.
4. Mailbox handling between PF/VF and PF/Switch Manager.
5. Receive Side Scaling (RSS) for PF/VF.
6. Scatter receive function for PF/VF.
7. reta update/query for PF/VF.
8. VLAN filter set for PF.
9. Link status query for PF/VF.

The following changes since commit f2c5125a686ab64034925dabafea0877d1e5857e:

  app/testpmd: use default Rx/Tx port configuration (2015-02-14 11:35:25 +0100)

are available in the git repository at:

  jing at dpdk.org:dpdk-fm10k-next.git master

for you to fetch changes up to 1b073a75d5e809f10c0a71cbc755b02045bf8783:

  fm10k: Add ABI version of librte_pmd_fm10k (2015-02-16 03:46:00 -0500)


Chen Jing D(Mark) (1):
  maintainers: claim for fm10k review

Jeff Shaw (15):
  fm10k: add base driver
  eal: add fm10k device id
  fm10k: register fm10k pmd PF driver
  Change config files to add fm10k into compile
  fm10k: add reta update/requery functions
  fm10k: add rx_queue_setup/release function
  fm10k: add tx_queue_setup/release function
  fm10k: add RX/TX single queue start/stop function
  fm10k: add dev start/stop functions
  fm10k: add receive and tranmit function
  fm10k: add PF RSS support
  fm10k: Add scatter receive function
  fm10k: add function to set vlan
  fm10k: Add SRIOV-VF support
  fm10k: add PF and VF interrupt handling function

Michael Qiu (1):
  fm10k: Add ABI version of librte_pmd_fm10k

 MAINTAINERS |4 +
 config/common_bsdapp|   11 +
 config/common_linuxapp  |   11 +
 lib/Makefile|1 +
 lib/librte_eal/common/include/rte_pci_dev_ids.h |   22 +
 lib/librte_pmd_fm10k/Makefile   |  100 ++
 lib/librte_pmd_fm10k/base/fm10k_api.c   |  341 
 lib/librte_pmd_fm10k/base/fm10k_api.h   |   61 +
 lib/librte_pmd_fm10k/base/fm10k_common.c|  572 ++
 lib/librte_pmd_fm10k/base/fm10k_common.h|   52 +
 lib/librte_pmd_fm10k/base/fm10k_mbx.c   | 2185 +++
 lib/librte_pmd_fm10k/base/fm10k_mbx.h   |  329 
 lib/librte_pmd_fm10k/base/fm10k_osdep.h |  148 ++
 lib/librte_pmd_fm10k/base/fm10k_pf.c| 1992 +
 lib/librte_pmd_fm10k/base/fm10k_pf.h|  155 ++
 lib/librte_pmd_fm10k/base/fm10k_tlv.c   |  914 ++
 lib/librte_pmd_fm10k/base/fm10k_tlv.h   |  199 +++
 lib/librte_pmd_fm10k/base/fm10k_type.h  |  937 ++
 lib/librte_pmd_fm10k/base/fm10k_vf.c|  641 +++
 lib/librte_pmd_fm10k/base/fm10k_vf.h|   91 +
 lib/librte_pmd_fm10k/fm10k.h|  293 +++
 lib/librte_pmd_fm10k/fm10k_ethdev.c | 1868 +++
 lib/librte_pmd_fm10k/fm10k_logs.h   |   78 +
 lib/librte_pmd_fm10k/fm10k_rxtx.c   |  459 +
 lib/librte_pmd_fm10k/rte_pmd_fm10k_version.map  |4 +
 mk/rte.app.mk   |4 +
 26 files changed, 11472 insertions(+)
 create mode 100644 lib/librte_pmd_fm10k/Makefile
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_api.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_api.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_common.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_common.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_mbx.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_mbx.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_osdep.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_pf.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_pf.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_tlv.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_tlv.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_type.h
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_vf.c
 create mode 100644 lib/librte_pmd_fm10k/base/fm10k_vf.h
 create mode 100644 lib/librte_pmd_fm10k/fm10k.h
 create mode 100644 lib/librte_pmd_fm10k/fm10k_ethdev.c
 create mode 100644 lib/librte_pmd_fm10k/fm10k_logs.h
 create mode 100644 lib/librte_pmd_fm10k/fm10k_rxtx.c
 create mode 100644 lib/librte_pmd_fm10k/rte_pmd_fm10k_version.map

[dpdk-dev] [PATCH v2] doc: Add requirements for x32 ABI

2015-02-16 Thread Daniel Mrzyglod

This patch add requirements about compiler and distribution support.

v2:
spelling fixes

Signed-off-by: Daniel Mrzyglod 
---
 doc/guides/linux_gsg/sys_reqs.rst | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst 
b/doc/guides/linux_gsg/sys_reqs.rst
index 8e2307b..ef4196e 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -62,7 +62,7 @@ Compilation of the DPDK
 *   coreutils:  cmp, sed, grep, arch

 *   gcc: versions 4.5.x or later is recommended for i686/x86_64. versions 
4.8.x or later is recommanded
-for ppc_64. On some distributions, some specific compiler flags and linker 
flags are enabled by
+for ppc_64 and x86_x32 ABI. On some distributions, some specific compiler 
flags and linker flags are enabled by
 default and affect performance (- fstack-protector, for example). Please 
refer to the documentation
 of your distribution and to gcc -dumpspecs.

@@ -78,7 +78,14 @@ Compilation of the DPDK

 glibc.ppc64, libgcc.ppc64, libstdc++.ppc64 and glibc-devel.ppc64 for IBM 
ppc_64;

-*   Python, version 2.6 or 2.7, to use various helper scripts included in the 
DPDK package
+.. note::
+
+x86_x32 ABI is currently supported with distribution packages only on 
Ubuntu
+higher than 13.10 or recent debian distribution. The only supported  
compiler is gcc 4.8+.
+
+.. note::
+
+Python, version 2.6 or 2.7, to use various helper scripts included in the 
DPDK package


 **Optional Tools:**
-- 
2.1.0

[dpdk-dev] [PATCH v2 00/11] qemu vhost-user support

2015-02-16 Thread Tetsuya Mukawa

On 2015/02/12 14:07, Huawei Xie wrote:
> vhost-user supports passing vring information to a seperate vhost enabled
> user space process, normally a user space vSwitch, through unix domain socket.
>
> In previous DPDK version, we implement a user space character device driver
> vhost-cuse in user space DPDK process. vring information is passed to the
> cuse driver through ioctl call, including eventfds for interrupt injection and
> host notification. A kernel module is developed to copy these fds from
> qemu process into our process. We also need some trick to map guest memory.
> (TODO: kickfd/callfd is reversed which causes confusion)
>
> known issue in vhost-user implementation in QEMU, reported by haifeng.lin at 
> huawei.com
> * QEMU doesn't send correct memory region information with multiple numa node 
> configuration
> http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html
>
> Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when 
> receive -1
> as fd on Ubuntu 14.04".
>
> Huawei Xie (11):
>  enable VIRTIO_NET_F_CTRL_RX
>  create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
>  rename vhost-net-cdev.h to vhost-net.h
>  move fd copying(from qemu process into vhost process) to eventfd_copy.c
>  copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
>  make host_memory_map a more generic function.
>  implement cuse_set_memory_table in virtio-net-cdev.c
>  add select based event driven processing
>  vhost user support
>  support dev->ifname
>  support calling rte_vhost_driver_register after 
> rte_vhost_driver_session_start
>
>  lib/librte_vhost/Makefile |   8 +-
>  lib/librte_vhost/rte_virtio_net.h |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c | 389 
>  lib/librte_vhost/vhost-net-cdev.h | 113 --
>  lib/librte_vhost/vhost-net.h  | 118 +++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c|  88 +
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h|  39 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c  | 258 ++
>  lib/librte_vhost/vhost_user/fd_man.h  |  67 
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c | 491 
> ++
>  lib/librte_vhost/virtio-net.h |  43 +++
>  19 files changed, 2491 insertions(+), 959 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>

Hi Xie,

I have 2 questions about v2 patches.
Could you please check my other emails?

Also checkpatch.pl reports some warnings.
I am not sure how strictly we should follow checkpatch.pl.

Here is what I did.
$ git show --format=email | checkpatch.pl --no-tree --no-signoff -q -

Is there a consensus how to use checkpatch.pl in DPDK community?

Thanks,
Tetsuya

[dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server

2015-02-16 Thread Tetsuya Mukawa

On 2015/02/12 14:07, Huawei Xie wrote:
> * support calling rte_vhost_driver_register after 
> rte_vhost_driver_session_start
> * add mutext to protect fdset from concurrent access
> * add busy flag in fdentry. this flag is set before cb and cleared after cb 
> is finished.
>
> mutex lock scenario in vhost:
>
> * event_dispatch(in rte_vhost_driver_session_start) runs in a seperate 
> thread, infinitely
> processing vhost messages through cb(callback).
> * event_dispatch acquires the lock, get the cb and its context, mark the busy 
> flag,
> and releases the mutex.
> * vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add 
> new fd into fdset.
> * vserver_message_handler cb frees data context, marks remove flag to request 
> to delete
> connfd(connection fd) from fdset.
> * after cb returns, event_dispatch
>   1. clears busy flag.
>   2. if there is remove request, call fdset_del, which acquires mutex, checks 
> busy flag, and
> removes connfd from fdset.
> * rte_vhost_driver_unregister(not implemented) runs in another thread, 
> acquires the mutex,
> calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data 
> context.
>
> The above steps ensures fd data context isn't freed when cb is using.
>
> VM(s) should have been shutdown before rte_vhost_driver_unregister.
>
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 63 
> +---
>  lib/librte_vhost/vhost_user/fd_man.h |  5 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +--
>  3 files changed, 82 insertions(+), 20 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_user/fd_man.c 
> b/lib/librte_vhost/vhost_user/fd_man.c
> index 929fbc3..63ac4df 100644
> --- a/lib/librte_vhost/vhost_user/fd_man.c
> +++ b/lib/librte_vhost/vhost_user/fd_man.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  
>  #include "fd_man.h"
> @@ -145,6 +146,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb 
> wcb, void *dat)
>   if (pfdset == NULL || fd == -1)
>   return -1;
>  
> + pthread_mutex_lock(&pfdset->fd_mutex);
> +
>   /* Find a free slot in the list. */
>   i = fdset_find_free_slot(pfdset);
>   if (i == -1)
> @@ -153,6 +156,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb 
> wcb, void *dat)
>   fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
>   pfdset->num++;
>  
> + pthread_mutex_unlock(&pfdset->fd_mutex);
> +
>   return 0;
>  }
>  
> @@ -164,17 +169,36 @@ fdset_del(struct fdset *pfdset, int fd)
>  {
>   int i;
>  
> + if (pfdset == NULL || fd == -1)
> + return;
> +
> +again:
> + pthread_mutex_lock(&pfdset->fd_mutex);
> +
>   i = fdset_find_fd(pfdset, fd);
>   if (i != -1 && fd != -1) {
> + /* busy indicates r/wcb is executing! */
> + if (pfdset->fd[i].busy == 1) {
> + pthread_mutex_unlock(&pfdset->fd_mutex);
> + goto again;
> + }
> +
>   pfdset->fd[i].fd = -1;
>   pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
>   pfdset->num--;
>   }
> +
> + pthread_mutex_unlock(&pfdset->fd_mutex);
>  }
>  
>  /**
>   * This functions runs in infinite blocking loop until there is no fd in
>   * pfdset. It calls corresponding r/w handler if there is event on the fd.
> + *
> + * Before the callback is called, we set the flag to busy status; If other
> + * thread(now rte_vhost_driver_unregister) calls fdset_del concurrently, it
> + * will wait until the flag is reset to zero(which indicates the callback is
> + * finished), then it could free the context after fdset_del.
>   */
>  void
>  fdset_event_dispatch(struct fdset *pfdset)
> @@ -183,6 +207,10 @@ fdset_event_dispatch(struct fdset *pfdset)
>   int i, maxfds;
>   struct fdentry *pfdentry;
>   int num = MAX_FDS;
> + fd_cb rcb, wcb;
> + void *dat;
> + int fd;
> + int remove1, remove2;
>  
>   if (pfdset == NULL)
>   return;
> @@ -190,18 +218,41 @@ fdset_event_dispatch(struct fdset *pfdset)
>   while (1) {
>   FD_ZERO(&rfds);
>   FD_ZERO(&wfds);
> + pthread_mutex_lock(&pfdset->fd_mutex);
> +
>   maxfds = fdset_fill(&rfds, &wfds, pfdset);
> - if (maxfds == -1)
> - return;
> + if (maxfds == -1) {
> + pthread_mutex_unlock(&pfdset->fd_mutex);
> + sleep(1);
> + continue;
> + }
> +
> + pthread_mutex_unlock(&pfdset->fd_mutex);
>  
>   select(maxfds + 1, &rfds, &wfds, NULL, NULL);
>  
>   for (i = 0; i < num; i++) {
> + remove1 = remove2 = 0;
> + pthread_mutex_lock(&pfdset->fd_mutex);
>   pfdentry = &pfdset->fd[i];
> - if (pfdentry->fd >= 0 && FD_ISSET(pfdentry-

[dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL

2015-02-16 Thread Tetsuya Mukawa

Hi Xie,

Could you please check commit title?
I guess this commit title involves first sentence of commit log.

Thanks,
Tetsuya


On 2015/02/12 14:07, Huawei Xie wrote:
> In virtnet_send_command:
>
>   /* Caller should know better */
>   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
>   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
>
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/virtio-net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index b041849..52b4957 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root;
>  
>  /* Features supported by this lib. */
>  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> -   (1ULL << VIRTIO_NET_F_CTRL_RX))
> + (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> + (1ULL << VIRTIO_NET_F_CTRL_RX))
>  static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>  
>  /* Line size for reading maps file. */

[dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server

2015-02-16 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, February 12, 2015 5:07 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically 
> registering vhost server
> 
> * support calling rte_vhost_driver_register after 
> rte_vhost_driver_session_start
> * add mutext to protect fdset from concurrent access
> * add busy flag in fdentry. this flag is set before cb and cleared after cb 
> is finished.
> 
> mutex lock scenario in vhost:
> 
> * event_dispatch(in rte_vhost_driver_session_start) runs in a seperate 
> thread, infinitely
> processing vhost messages through cb(callback).
> * event_dispatch acquires the lock, get the cb and its context, mark the busy 
> flag,
> and releases the mutex.
> * vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add 
> new fd into fdset.
> * vserver_message_handler cb frees data context, marks remove flag to request 
> to delete
> connfd(connection fd) from fdset.
> * after cb returns, event_dispatch
>   1. clears busy flag.
>   2. if there is remove request, call fdset_del, which acquires mutex, checks 
> busy flag, and
> removes connfd from fdset.
> * rte_vhost_driver_unregister(not implemented) runs in another thread, 
> acquires the mutex,
> calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data 
> context.
> 
> The above steps ensures fd data context isn't freed when cb is using.
> 
> VM(s) should have been shutdown before rte_vhost_driver_unregister.
> 
> Signed-off-by: Huawei Xie 

Acked-by: Konstantin Ananyev 

> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 63 
> +---
>  lib/librte_vhost/vhost_user/fd_man.h |  5 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +--
>  3 files changed, 82 insertions(+), 20 deletions(-)
>

[dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing

2015-02-16 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, February 12, 2015 5:07 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event 
> driven processing
> 
> for more generic event driven processing, refer to:
>   http://libevent.org/
> 
> 
> Signed-off-by: Huawei Xie 

Acked-by: Konstantin Ananyev 

> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 207 
> +++
>  lib/librte_vhost/vhost_user/fd_man.h |  64 +++
>  2 files changed, 271 insertions(+)
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>

[dpdk-dev] [PATCH v2 14/15] examples/l3fwd: support of unified packet type

2015-02-16 Thread Ananyev, Konstantin

Hi Helin,

> -Original Message-
> From: Zhang, Helin
> Sent: Monday, February 09, 2015 6:41 AM
> To: dev at dpdk.org
> Cc: Cao, Waterman; Liang, Cunming; Liu, Jijiang; Ananyev, Konstantin; 
> Richardson, Bruce; Zhang, Helin
> Subject: [PATCH v2 14/15] examples/l3fwd: support of unified packet type
> 
> To unify packet types among all PMDs, bit masks and relevant macros
> of packet type for ol_flags are replaced by unified packet type and
> relevant macros.
> 
> Signed-off-by: Helin Zhang 
> ---
>  examples/l3fwd/main.c | 64 
> ---
>  1 file changed, 35 insertions(+), 29 deletions(-)
> 
> v2 changes:
> * Used redefined packet types and enlarged packet_type field in mbuf.
> 
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 6f7d7d4..302322e 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -958,7 +958,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
> struct lcore_conf *qcon
> 
>   eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> 
> - if (m->ol_flags & PKT_RX_IPV4_HDR) {
> + if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
>   /* Handle IPv4 headers.*/
>   ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned 
> char *) +
>   sizeof(struct ether_hdr));
> @@ -993,7 +993,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
> struct lcore_conf *qcon
> 
>   send_single_packet(m, dst_port);
> 
> - } else {
> + } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {

If you changed to from 'else' to ' else if' here, then I suppose you'll need to 
add another 'else' after it:
to handle case, where input packets are neither IPV4 neither IPv6.
Otherwise you might start 'leaking' such mbufs.

>   /* Handle IPv6 headers.*/
>   struct ipv6_hdr *ipv6_hdr;
> 
> @@ -1039,11 +1039,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t 
> portid, struct lcore_conf *qcon
>   * to BAD_PORT value.
>   */
>  static inline __attribute__((always_inline)) void
> -rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
> +rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint16_t ptype)

Shouldn't it be 'uint32_t ptype'?

>  {
>   uint8_t ihl;
> 
> - if ((flags & PKT_RX_IPV4_HDR) != 0) {
> + if (RTE_ETH_IS_IPV4_HDR(ptype)) {
> 
>   ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;
> 
> @@ -1074,11 +1074,11 @@ get_dst_port(const struct lcore_conf *qconf, struct 
> rte_mbuf *pkt,
>   struct ipv6_hdr *ipv6_hdr;
>   struct ether_hdr *eth_hdr;
> 
> - if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
> + if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
>   if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
>   &next_hop) != 0)
>   next_hop = portid;
> - } else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
> + } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
>   eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>   ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
>   if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
> @@ -1112,17 +1112,19 @@ process_packet(struct lcore_conf *qconf, struct 
> rte_mbuf *pkt,
>   ve = val_eth[dp];
> 
>   dst_port[0] = dp;
> - rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
> + rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);
> 
>   te =  _mm_blend_epi16(te, ve, MASK_ETH);
>   _mm_store_si128((__m128i *)eth_hdr, te);
>  }
> 
>  /*
> - * Read ol_flags and destination IPV4 addresses from 4 mbufs.
> + * Read packet_type and destination IPV4 addresses from 4 mbufs.
>   */
>  static inline void
> -processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
> +processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
> + __m128i *dip,
> + uint32_t *ipv4_flag)
>  {
>   struct ipv4_hdr *ipv4_hdr;
>   struct ether_hdr *eth_hdr;
> @@ -1131,22 +1133,20 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], 
> __m128i *dip, uint32_t *flag)
>   eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
>   ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>   x0 = ipv4_hdr->dst_addr;
> - flag[0] = pkt[0]->ol_flags & PKT_RX_IPV4_HDR;
> 
>   eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
>   ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>   x1 = ipv4_hdr->dst_addr;
> - flag[0] &= pkt[1]->ol_flags;
> 
>   eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
>   ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>   x2 = ipv4_hdr->dst_addr;
> - flag[0] &= pkt[2]->ol_flags;
> 
>   eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
>   ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>   x3 = ipv4_hdr->dst_addr;
> - flag[0] &= pkt[3]->ol_flags;
> + *ipv4_flag = pkt[0]->packet_type & pkt[1]->packet_type &
> + pkt[2]->pa

[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-16 Thread Matthew Hall

On Mon, Feb 16, 2015 at 10:33:52AM -0600, Jay Rolette wrote:
> In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
> netif_rx(). The source for netif_receive_skb() point out that it should
> only be called from soft-irq context, which isn't the case for KNI.

For the uninitiated among us, what was the practical effect of the coding 
error? Waiting forever for a lock which will never be available in IRQ 
context, or causing unintended re-entrancy, or what?

Thanks,
Matthew.

[dpdk-dev] Intel DPDK support for ntop DPI

2015-02-16 Thread Matthew Hall

I did some research on this previously before concluding NDPI wouldn't help me 
much with my own particular application.

Just for running NDPI DPDK is not strictly needed, as NDPI is normally 
read-only so something like PF_RING would work in default ntop / ndpi. If 
you're trying to use NDPI in a read-write application then you'd need to make 
the integration yourself.

However the much harder part than just the integration would be tracking 
flow-starts for TCP and UDP sockets to feed to NDPI. From what I found NDPI by 
itself does not appear to offer connection tracking. It has to be implemented 
separately.

My advice: carefully research how ntop calls ndpi in their code using cscope. 
Because the ndpi documentation about how to use all the functions is kind of 
bad and not so clear. It took quite some hours before I understood it wasn't 
going to help with what I was coding in my application.

Matthew.

On Mon, Feb 16, 2015 at 04:22:25AM -0800, harshavardhan Reddy wrote:
> Hi All,
> 
> Is ntop DPI integration available for Intel DPDK..?
> 
> I could see only Propretory qosmos ixEngine integrated with DPDK and
> Windriver with its own DPI.
> 
> But not found any info about nDPI integration with DPDK.
> 
> 
> Thank You,
> 
> Regards,
> HVR

[dpdk-dev] [PATCH v8 2/2] librte_pmd_null: Support port hotplug function

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:19 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 2/2] librte_pmd_null: Support port hotplug function
> 
> This patch adds port hotplug support to Null PMD.
> 
> v7:
>  - Add parameter checkings.
>(Thanks to Iremonger, Bernard)
> v6:
>  - Fix a parameter of rte_eth_dev_free().
> v4:
>  - Fix commit title.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 1/2] librte_pmd_null: Add Null PMD

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:19 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 1/2] librte_pmd_null: Add Null PMD
> 
> Null PMD is a driver of the virtual device particularly designed to measure 
> performance of DPDK PMDs.
> When an application call rx, Null PMD just allocates mbufs and returns those. 
> Also tx, the PMD just
> frees mbufs.
> 
> The PMD has following options.
> - size: specify packe size allocated by RX. Default packet size is 64.
> - copy: specify 1 or 0 to enable or disable copy while RX and TX.
>   Default value is 0(disabled).
>   This option is used for emulating more realistic data transfer.
>   Copy size is equal to packet size.
> 
> To use the PMD, enable CONFIG_RTE_BUILD_SHARED_LIB in config file. Then 
> compile the PMD as
> shared library. The library can be linked using '-d'
> option when an application invokes.
> 
> Here is an example.
> $ sudo ./testpmd -c f -n 4 -d librte_pmd_null.so \
>   --vdev 'eth_null0' --vdev 'eth_null1' -- -i --no-flush-rx
> 
> If testpmd is compiled with CONFIG_RTE_BUILD_SHARED_LIB, it may need to 
> specify more libraries
> using '-d' option.
> 
> v8:
>  - Fix Makefile and add version map file.
>(Thanks to Qiu, Michael and Iremonger, Bernard)
> v7:
>  - Add parameter checkings.
>(Thanks to Iremonger, Bernard)
>  - Remove needless "__rte_unused".
> v4:
>  - Fix memory leak.
>(Thanks to Iremonger, Bernard)
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8] librte_pmd_pcap: Add port hotplug support

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:15 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8] librte_pmd_pcap: Add port hotplug support
> 
> This patch adds finalization code to free resources allocated by the PMD.
> 
> v6:
>  - Fix a paramter of rte_eth_dev_free().
> v4:
>  - Change function name.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8] testpmd: Add port hotplug support

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:16 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8] testpmd: Add port hotplug support
> 
> The patch introduces following commands.
> - port attach [ident]
> - port detach [port_id]
>  - attach: attaching a port
>  - detach: detaching a port
>  - ident: pci address of physical device.
>   Or device name and parameters of virtual device.
>  (ex. :02:00.0, eth_pcap0,iface=eth0)
>  - port_id: port identifier
> 
> v7:
> - Fix doc.
>   (Thanks to Iremonger, Bernard)
> - Fix port checking implementation of star_port();
>   (Thanks to Qiu, Michael)
> v5:
> - Add testpmd documentation.
>   (Thanks to Iremonger, Bernard)
> v4:
>  - Fix strings of command help.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 14/14] doc: Add port hotplug framework section to programmers guide

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:15 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 14/14] doc: Add port hotplug framework section to 
> programmers guide
> 
> This patch adds a new section for describing port hotplug framework.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 13/14] eal: Enable port hotplug framework in Linux

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:15 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 13/14] eal: Enable port hotplug framework in Linux
> 
> The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 12/14] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:15 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 12/14] eal/pci: Add rte_eal_dev_attach/detach() functions
> 
> These functions are used for attaching or detaching a port.
> When rte_eal_dev_attach() is called, the function tries to realize the device 
> name as pci address. If
> this is done successfully,
> rte_eal_dev_attach() will attach physical device port. If not, attaches 
> virtual devive port.
> When rte_eal_dev_detach() is called, the function gets the device type of 
> this port to know whether
> the port is come from physical or virtual.
> And then specific detaching function will be called.
> 
> v8:
> - Add missing symbol in version map.
>   (Thanks to Qiu, Michael and Iremonger, Bernard)
> v7:
> - Fix typo of warning messages.
>   (Thanks to Qiu, Michael)
> v5:
> - Change function names like below.
>   rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
>   rte_eal_dev_invoke() to rte_eal_vdev_invoke().
> - Add code to handle a return value of rte_eal_devargs_remove().
> - Fix pci address format in rte_eal_dev_detach().
> v4:
> - Fix comment.
> - Add error checking.
> - Fix indent of 'if' statement.
> - Change function name.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v2] doc: Add requirements for x32 ABI

2015-02-16 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Monday, February 16, 2015 4:27 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] doc: Add requirements for x32 ABI
> 
> This patch add requirements about compiler and distribution support.
> 
> v2:
> spelling fixes
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  doc/guides/linux_gsg/sys_reqs.rst | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/sys_reqs.rst
> b/doc/guides/linux_gsg/sys_reqs.rst
> index 8e2307b..ef4196e 100644
> --- a/doc/guides/linux_gsg/sys_reqs.rst
> +++ b/doc/guides/linux_gsg/sys_reqs.rst
> @@ -62,7 +62,7 @@ Compilation of the DPDK
>  *   coreutils:  cmp, sed, grep, arch
> 
>  *   gcc: versions 4.5.x or later is recommended for i686/x86_64. versions 
> 4.8.x
> or later is recommanded
> -for ppc_64. On some distributions, some specific compiler flags and 
> linker
> flags are enabled by
> +for ppc_64 and x86_x32 ABI. On some distributions, some specific
> compiler flags and linker flags are enabled by
>  default and affect performance (- fstack-protector, for example). Please
> refer to the documentation
>  of your distribution and to gcc -dumpspecs.
> 
> @@ -78,7 +78,14 @@ Compilation of the DPDK
> 
>  glibc.ppc64, libgcc.ppc64, libstdc++.ppc64 and glibc-devel.ppc64 for IBM
> ppc_64;
> 
> -*   Python, version 2.6 or 2.7, to use various helper scripts included in the
> DPDK package
> +.. note::
> +
> +x86_x32 ABI is currently supported with distribution packages only on
> Ubuntu
> +higher than 13.10 or recent debian distribution. The only supported
> compiler is gcc 4.8+.
> +
> +.. note::
> +
> +Python, version 2.6 or 2.7, to use various helper scripts included in the
> DPDK package
> 
> 
>  **Optional Tools:**
> --
> 2.1.0

Acked-by: Pablo de Lara 

Thanks Daniel!

[dpdk-dev] [PATCH v8 11/14] ethdev: Add one dev_type parameter to rte_eth_dev_allocate

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:15 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 11/14] ethdev: Add one dev_type parameter to 
> rte_eth_dev_allocate
> 
> This new parameter is needed to keep device type like physical or virtual.
> Port detaching processes are different between physical and virtual.
> This parameter lets detaching function know a device type of the port.
> 
> v8:
> - NONE_TRACE is replaced by NO_TRACE.
> - Add missing symbol in version map.
>   (Thanks to Qiu, Michael and Iremonger, Bernard)
> v4:
> - Fix comments of rte_eth_dev_type.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 10/14] eal/pci: Cleanup pci driver initialization code

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 10/14] eal/pci: Cleanup pci driver initialization code
> 
> - Add rte_eal_pci_close_one_dirver()
>   The function is used for closing the specified driver and device.
> - Add pci_invoke_all_drivers()
>   The function is based on pci_probe_all_drivers. But it can not only
>   probe but also close drivers.
> - Add pci_close_all_drivers()
>   The function tries to find a driver for the specified device, and
>   then close the driver.
> - Add rte_eal_pci_probe_one() and rte_eal_pci_close_one()
>   The functions are used for probe and close a device.
>   First the function tries to find a device that has the specified
>   PCI address. Then, probe or close the device.
> 
> v5:
> - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
> v4:
> - Fix parameter checking.
> - Fix indent of 'if' statement.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 09/14] eal/pci: Add a function to remove the entry of devargs list

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 09/14] eal/pci: Add a function to remove the entry of 
> devargs list
> 
> The function removes the specified devargs entry from devargs_list.
> Also, the patch adds sanity checking to rte_eal_devargs_add().
> 
> v5:
> - Change function definition of rte_eal_devargs_remove().
> v4:
> - Fix sanity check code.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 08/14] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 08/14] eal/linux/pci: Add functions for unmapping igb_uio 
> resources
> 
> The patch adds functions for unmapping igb_uio resources. The patch is only 
> for Linux and igb_uio
> environment. VFIO and BSD are not supported.
> 
> v8:
> - Fix typo.
>   (Thanks to Iremonger, Bernard)
> v5:
> - Fix pci_unmap_device() to check pt_driver.
> v4:
> - Add parameter checking.
> - Add header file to determine if hotplug can be enabled.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 07/14] ethdev: Add functions that will be used by port hotplug functions

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 07/14] ethdev: Add functions that will be used by port 
> hotplug functions
> 
> The patch adds following functions.
> 
> - rte_eth_dev_save()
>   The function is used for saving current rte_eth_dev structures.
> - rte_eth_dev_get_changed_port()
>   The function receives the rte_eth_dev structures, then compare
>   these with current values to know which port is actually
>   attached or detached.
> - rte_eth_dev_get_addr_by_port()
>   The function returns a pci address of an ethdev specified by port
>   identifier.
> - rte_eth_dev_get_port_by_addr()
>   The function returns a port identifier of an ethdev specified by
>   pci address.
> - rte_eth_dev_get_name_by_port()
>   The function returns a unique identifier name of an ethdev
>   specified by port identifier.
> - Add rte_eth_dev_check_detachable()
>   The function returns whether a PMD supports detach function.
> 
> Also, the patch changes scope of rte_eth_dev_allocated() to global.
> This function will be called by virtual PMDs to support port hotplug.
> So change scope of the function to global.
> 
> v8:
> - Add size parameter to rte_eth_dev_save().
> - Add missing symbol in version map.
>   (Thanks to Qiu, Michael and Iremonger, Bernard)
> v7:
> - Add pt_driver checking to rte_eth_dev_check_detachable().
>   (Thanks to Qiu, Michael)
> v5:
> - Fix return value of below functions.
>   rte_eth_dev_get_changed_port().
>   rte_eth_dev_get_port_by_addr().
> v4:
> - Add parameter checking.
> v3:
> - Fix if-condition bug while comparing pci addresses.
> - Add error checking codes.
> Reported-by: Mark Enright 
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 06/14] eal, ethdev: Add a function and function pointers to close ether device

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 06/14] eal,ethdev: Add a function and function pointers to 
> close ether device
> 
> The patch adds function pointer to rte_pci_driver and eth_driver structure. 
> These function pointers
> are used when ports are detached.
> Also, the patch adds rte_eth_dev_uninit(). So far, it's not called by 
> anywhere, but it will be called
> when port hotplug function is implemented.
> 
> v6:
> - Fix rte_eth_dev_uninit() to handle a return value of uninit
>   function of PMD.
> v4:
> - Add parameter checking.
> - Change function names.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 05/14] ethdev: Add rte_eth_dev_free to free specified device

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 05/14] ethdev: Add rte_eth_dev_free to free specified 
> device
> 
> This patch adds rte_eth_dev_free(). The function is used for changing an 
> attached status of the device
> that has specified name.
> 
> v6:
> - Use rte_eth_dev structure as the paramter of rte_eth_dev_free().
> v4:
> - Add parameter checking.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 04/14] eal/pci: Consolidate pci address comparison APIs

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 04/14] eal/pci: Consolidate pci address comparison APIs
> 
> This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
> eal_compare_pci_addr().
> 
> v8:
> - Fix pci_scan_one() to update sysfs values.
>   (Thanks to Qiu, Michael and Iremonger, Bernard)
> v5:
> - Fix pci_scan_one to handle pt_driver correctly.
> v4:
> - Fix calculation method of eal_compare_pci_addr().
> - Add parameter checking.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 03/14] eal/pci, ethdev: Remove assumption that port will not be detached

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 03/14] eal/pci,ethdev: Remove assumption that port will 
> not be detached
> 
> To remove assumption, do like followings.
> 
> This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver 
> structure. The flags
> indicate the driver can detach devices at runtime.
> Also, remove assumption that port will not be detached.
> 
> To remove the assumption.
> - Add 'attached' member to rte_eth_dev structure.
>   This member is used for indicating the port is attached, or not.
> - Add rte_eth_dev_allocate_new_port().
>   This function is used for allocating new port.
> 
> v8:
> - NONE_TRACE is changed to NO_TRACE.
>   (Thanks to Iremonger, Bernard)
> v5:
> - Change parameters of rte_eth_dev_validate_port() to cleanup code.
> v4:
> - Use braces with 'for' loop.
> - Fix indent of 'if' statement.
> 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 02/14] eal_pci: pci memory map work with driver type

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 02/14] eal_pci: pci memory map work with driver type
> 
> From: Michael Qiu 
> 
> With the driver type flag in struct rte_pci_dev, we do not need to always  
> map uio devices with vfio
> related function when vfio enabled.
> 
> Signed-off-by: Michael Qiu 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v8 01/14] eal_pci: Add flag to hold kernel driver type

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 16, 2015 4:14 AM
> To: dev at dpdk.org
> Cc: Qiu, Michael; Iremonger, Bernard; Tetsuya Mukawa
> Subject: [PATCH v8 01/14] eal_pci: Add flag to hold kernel driver type
> 
> From: Michael Qiu 
> 
> Currently, dpdk has no ability to know which type of driver(
> vfio-pci/igb_uio/uio_pci_generic) the device used. It only can check whether 
> vfio is enabled or not
> staticly.
> 
> It really useful to have the flag, becasue different type need to handle 
> differently in runtime. For
> example, pci memory map, pot hotplug, and so on.
> 
> This patch add a flag field for pci device to solve above issue.
> 
> Signed-off-by: Michael Qiu 
> Signed-off-by: Tetsuya Mukawa 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH] doc: Add requirements for x32 ABI

2015-02-16 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Friday, February 13, 2015 3:58 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: Add requirements for x32 ABI
> 
> This patch add requirements about compiler and distribution support.
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  doc/guides/linux_gsg/sys_reqs.rst | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/sys_reqs.rst
> b/doc/guides/linux_gsg/sys_reqs.rst
> index 8e2307b..ef4196e 100644
> --- a/doc/guides/linux_gsg/sys_reqs.rst
> +++ b/doc/guides/linux_gsg/sys_reqs.rst
> @@ -62,7 +62,7 @@ Compilation of the DPDK
>  *   coreutils:  cmp, sed, grep, arch
> 
>  *   gcc: versions 4.5.x or later is recommended for i686/x86_64. versions 
> 4.8.x
> or later is recommanded
> -for ppc_64. On some distributions, some specific compiler flags and 
> linker
> flags are enabled by
> +for ppc_64 and x86_x32 ABI. On some distributions, some specific
> compiler flags and linker flags are enabled by
>  default and affect performance (- fstack-protector, for example). Please
> refer to the documentation
>  of your distribution and to gcc -dumpspecs.
> 
> @@ -78,7 +78,14 @@ Compilation of the DPDK
> 
>  glibc.ppc64, libgcc.ppc64, libstdc++.ppc64 and glibc-devel.ppc64 for IBM
> ppc_64;
> 
> -*   Python, version 2.6 or 2.7, to use various helper scripts included in the
> DPDK package
> +.. note::
> +
> +x86_x32 ABI is currently supported with distribution packages only on
> Ubuntu
> +higher then 13.10 or recent debian distribution. The only supported
> compiler is gcc 4.8+.

Typo here: "then" -> "than"

> +
> +.. note::
> +
> +Python, version 2.6 or 2.7, to use various helper scripts included in the
> DPDK package
> 
> 
>  **Optional Tools:**
> --
> 2.1.0

[dpdk-dev] ACL lookup doesn't work for some schemes

2015-02-16 Thread Ananyev, Konstantin



> -Original Message-
> From: Markovic, Stevan [mailto:smarkovi at akamai.com]
> Sent: Monday, February 16, 2015 3:22 PM
> To: Ananyev, Konstantin; yuzhichang_scl at hotmail.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] ACL lookup doesn't work for some schemes
> 
> Hi,
> 
> 
> On 2/16/15, 4:56 AM, "Ananyev, Konstantin" 
> wrote:
> 
> >
> >Yes, right now, libtre_acl to work correctly first field has to be 1B
> >long and all subsequent grouped into sets of 4 consecutive bytes.
> >I thought we have it documented into our PG, ACL section:
> >http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
> >Though re-reading it again:
> >"For performance reasons, the inner loop of the search function is
> >unrolled to process four input bytes at a time. This requires the input
> >to be grouped into sets of 4 consecutive bytes. The loop processes the
> >first input byte as part of the setup and then subsequent bytes must be
> >in groups of 4 consecutive bytes."
> >It probably not very clear and need to be explained in more details.
> >Will update the doc.
> >
> >Konstantin
> >
> 
> While improving API documentation would be great, enforcing these
> constraints on user defined fields in rte_acl_build(?) also (return an
> error if constraints are not met) would be even better.
> 
> Stevan

Good point. Will try to add some extra checks  to rte_acl_build().
If time permits - for 2.0 timeframe, if not then for 2.1.  
Konstantin

[dpdk-dev] [PATCH 2/2] Remove RTE_MBUF_REFCNT references

2015-02-16 Thread Sergio Gonzalez Monroy

This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.

Signed-off-by: Sergio Gonzalez Monroy 
---
 app/test/test_link_bonding.c| 15 ---
 app/test/test_mbuf.c| 17 -
 config/common_bsdapp|  1 -
 config/common_linuxapp  |  1 -
 examples/Makefile   |  4 ++--
 examples/ip_fragmentation/Makefile  |  4 
 examples/ip_pipeline/Makefile   |  3 ---
 examples/ip_pipeline/main.c |  5 -
 examples/ipv4_multicast/Makefile|  4 
 examples/vhost/main.c   | 13 -
 lib/librte_ip_frag/Makefile |  4 
 lib/librte_ip_frag/rte_ip_frag.h|  4 
 lib/librte_mbuf/rte_mbuf.c  |  2 --
 lib/librte_mbuf/rte_mbuf.h  | 30 --
 lib/librte_pmd_bond/Makefile|  4 
 lib/librte_pmd_bond/rte_eth_bond.h  |  2 --
 lib/librte_pmd_bond/rte_eth_bond_args.c |  2 --
 lib/librte_pmd_bond/rte_eth_bond_pmd.c  | 10 --
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |  8 
 lib/librte_port/Makefile|  4 
 20 files changed, 6 insertions(+), 131 deletions(-)

diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 579ebbf..54895ab 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -708,9 +708,7 @@ test_set_bonding_mode(void)
int bonding_modes[] = { BONDING_MODE_ROUND_ROBIN,

BONDING_MODE_ACTIVE_BACKUP,
BONDING_MODE_BALANCE,
-#ifdef RTE_MBUF_REFCNT
BONDING_MODE_BROADCAST
-#endif
};

/* Test supported link bonding modes */
@@ -1425,7 +1423,6 @@ test_roundrobin_tx_burst(void)
return remove_slaves_and_stop_bonded_device();
 }

-#ifdef RTE_MBUF_REFCNT
 static int
 verify_mbufs_ref_count(struct rte_mbuf **mbufs, int nb_mbufs, int val)
 {
@@ -1439,8 +1436,6 @@ verify_mbufs_ref_count(struct rte_mbuf **mbufs, int 
nb_mbufs, int val)
}
return 0;
 }
-#endif
-

 static void
 free_mbufs(struct rte_mbuf **mbufs, int nb_mbufs)
@@ -1545,12 +1540,10 @@ test_roundrobin_tx_burst_slave_tx_fail(void)
(unsigned int)port_stats.opackets, 
slave_expected_tx_count);
}

-#ifdef RTE_MBUF_REFCNT
/* Verify that all mbufs have a ref value of zero */
TEST_ASSERT_SUCCESS(verify_mbufs_ref_count(&pkt_burst[tx_count],
TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT, 1),
"mbufs refcnts not as expected");
-#endif
free_mbufs(&pkt_burst[tx_count], TEST_RR_SLAVE_TX_FAIL_PACKETS_COUNT);

/* Clean up and remove slaves from bonded device */
@@ -3056,12 +3049,10 @@ test_balance_tx_burst_slave_tx_fail(void)
(unsigned int)port_stats.opackets,
TEST_BAL_SLAVE_TX_FAIL_BURST_SIZE_2);

-#ifdef RTE_MBUF_REFCNT
/* Verify that all mbufs have a ref value of zero */
TEST_ASSERT_SUCCESS(verify_mbufs_ref_count(&pkts_burst_1[tx_count_1],
TEST_BAL_SLAVE_TX_FAIL_PACKETS_COUNT, 1),
"mbufs refcnts not as expected");
-#endif

free_mbufs(&pkts_burst_1[tx_count_1],
TEST_BAL_SLAVE_TX_FAIL_PACKETS_COUNT);
@@ -3472,9 +3463,6 @@ 
test_balance_verify_slave_link_status_change_behaviour(void)
return remove_slaves_and_stop_bonded_device();
 }

-#ifdef RTE_MBUF_REFCNT
-/** Broadcast Mode Tests */
-
 static int
 test_broadcast_tx_burst(void)
 {
@@ -4001,7 +3989,6 @@ 
test_broadcast_verify_slave_link_status_change_behaviour(void)
/* Clean up and remove slaves from bonded device */
return remove_slaves_and_stop_bonded_device();
 }
-#endif

 static int
 test_reconfigure_bonded_device(void)
@@ -4592,14 +4579,12 @@ static struct unit_test_suite link_bonding_test_suite  
= {
TEST_CASE(test_tlb_verify_mac_assignment),
TEST_CASE(test_tlb_verify_promiscuous_enable_disable),
TEST_CASE(test_tlb_verify_slave_link_status_change_failover),
-#ifdef RTE_MBUF_REFCNT
TEST_CASE(test_broadcast_tx_burst),
TEST_CASE(test_broadcast_tx_burst_slave_tx_fail),
TEST_CASE(test_broadcast_rx_burst),
TEST_CASE(test_broadcast_verify_promiscuous_enable_disable),
TEST_CASE(test_broadcast_verify_mac_assignment),

TEST_CASE(test_broadcast_verify_slave_link_status_change_behaviour),
-#endif
TEST_CASE(test_reconfigure_bonded_device),
TEST_CASE(test_close_bonded_device),

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index e86ba22..9de

[dpdk-dev] [PATCH 1/2] mbuf: Introduce IND_ATTACHED_MBUF flag

2015-02-16 Thread Sergio Gonzalez Monroy

Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.

We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.

So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.

This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.

The patch also modifies the vhost example as it was using the
RTE_MBUF_INDERECT macro to detect if it was an mbuf with external buffer.

Signed-off-by: Sergio Gonzalez Monroy 
---
 examples/vhost/main.c  |  6 --
 lib/librte_mbuf/rte_mbuf.h | 15 +--
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3a35359..5e341d6 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -139,6 +139,8 @@
 /* Number of descriptors per cacheline. */
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))

+#define MBUF_EXT_MEM(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -1567,7 +1569,7 @@ txmbuf_clean_zcp(struct virtio_net *dev, struct vpool 
*vpool)

for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
-   if (likely(RTE_MBUF_INDIRECT(mbuf)))
+   if (likely(MBUF_EXT_MEM(mbuf)))
pktmbuf_detach_zcp(mbuf);
rte_ring_sp_enqueue(vpool->ring, mbuf);

@@ -1630,7 +1632,7 @@ static void mbuf_destroy_zcp(struct vpool *vpool)
for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
if (likely(mbuf != NULL)) {
-   if (likely(RTE_MBUF_INDIRECT(mbuf)))
+   if (likely(MBUF_EXT_MEM(mbuf)))
pktmbuf_detach_zcp(mbuf);
rte_ring_sp_enqueue(vpool->ring, (void *)mbuf);
}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 16059c6..12e7545 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -162,6 +162,8 @@ extern "C" {
 /** Tell the NIC it's an outer IPv6 packet for tunneling packet */
 #define PKT_TX_OUTER_IPV6(1ULL << 60)

+#define IND_ATTACHED_MBUF(1ULL << 62) /**< Indirect attached mbuf */
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */

@@ -305,13 +307,12 @@ struct rte_mbuf {
 /**
  * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
  */
-#define RTE_MBUF_INDIRECT(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
+#define RTE_MBUF_INDIRECT(mb)   (mb->ol_flags & IND_ATTACHED_MBUF)

 /**
  * Returns TRUE if given mbuf is direct, or FALSE otherwise.
  */
-#define RTE_MBUF_DIRECT(mb) (RTE_MBUF_FROM_BADDR((mb)->buf_addr) == (mb))
-
+#define RTE_MBUF_DIRECT(mb) (!RTE_MBUF_INDIRECT(mb))

 /**
  * Private data in case of pktmbuf pool.
@@ -713,7 +714,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, 
struct rte_mbuf *md)
mi->next = NULL;
mi->pkt_len = mi->data_len;
mi->nb_segs = 1;
-   mi->ol_flags = md->ol_flags;
+   mi->ol_flags = md->ol_flags | IND_ATTACHED_MBUF;
mi->packet_type = md->packet_type;

__rte_mbuf_sanity_check(mi, 1);
@@ -744,6 +745,8 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
RTE_PKTMBUF_HEADROOM : m->buf_len;

m->data_len = 0;
+
+   m->ol_flags = 0;
 }

 #endif /* RTE_MBUF_REFCNT */
@@ -757,7 +760,6 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 #ifdef RTE_MBUF_REFCNT
if (likely (rte_mbuf_refcnt_read(m) == 1) ||
likely (rte_mbuf_refcnt_update(m, -1) == 0)) {
-   struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);

rte_mbuf_refcnt_set(m, 0);

@@ -765,7 +767,8 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 *  - detach mbuf
 *  - free attached mbuf segment
 */
-   if (unlikely (md != m)) {
+   if (RTE_MBUF_INDIRECT(m)) {
+   struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);
rte_pktmbuf_detach(m);
if (rte_mbuf_refcnt_update(md, -1) == 0)

[dpdk-dev] [PATCH 0/2] Removal of RTE_MBUF_REFCNT

2015-02-16 Thread Sergio Gonzalez Monroy

This patch tries to remove the RTE_MBUF_REFCNT config options and dependencies
by introducing a new mbuf flag IND_ATTACHED_MBUF that would indicate when the 
mbuf
is an indirect attached mbuf, to differentiate between indirect mbufs and mbufs
with external memory buffers (ie. vhost zero copy).

Previous discussion:
http://dpdk.org/ml/archives/dev/2014-October/007127.html

Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.

We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.

So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.

Sergio Gonzalez Monroy (2):
  mbuf: Introduce IND_ATTACHED_MBUF flag
  Remove RTE_MBUF_REFCNT references

 app/test/test_link_bonding.c| 15 ---
 app/test/test_mbuf.c| 17 +++--
 config/common_bsdapp|  1 -
 config/common_linuxapp  |  1 -
 examples/Makefile   |  4 +--
 examples/ip_fragmentation/Makefile  |  4 ---
 examples/ip_pipeline/Makefile   |  3 ---
 examples/ip_pipeline/main.c |  5 
 examples/ipv4_multicast/Makefile|  4 ---
 examples/vhost/main.c   | 19 +++---
 lib/librte_ip_frag/Makefile |  4 ---
 lib/librte_ip_frag/rte_ip_frag.h|  4 ---
 lib/librte_mbuf/rte_mbuf.c  |  2 --
 lib/librte_mbuf/rte_mbuf.h  | 45 +++--
 lib/librte_pmd_bond/Makefile|  4 ---
 lib/librte_pmd_bond/rte_eth_bond.h  |  2 --
 lib/librte_pmd_bond/rte_eth_bond_args.c |  2 --
 lib/librte_pmd_bond/rte_eth_bond_pmd.c  | 10 
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |  8 --
 lib/librte_port/Makefile|  4 ---
 20 files changed, 19 insertions(+), 139 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization

2015-02-16 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang
> Sent: Thursday, January 29, 2015 2:39 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization
> 
> This patch set optimizes memcpy for DPDK for both SSE and AVX platforms.
> It also extends memcpy test coverage with unaligned cases and more test
> points.
> 
> Optimization techniques are summarized below:
> 
> 1. Utilize full cache bandwidth
> 
> 2. Enforce aligned stores
> 
> 3. Apply load address alignment based on architecture features
> 
> 4. Make load/store address available as early as possible
> 
> 5. General optimization techniques like inlining, branch reducing, prefetch
> pattern access
> 
> --
> Changes in v2:
> 
> 1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast
> build
> 
> 2. Modified macro definition for better code readability & safety
> 
> Zhihong Wang (4):
>   app/test: Disabled VTA for memcpy test in app/test/Makefile
>   app/test: Removed unnecessary test cases in app/test/test_memcpy.c
>   app/test: Extended test coverage in app/test/test_memcpy_perf.c
>   lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
> and AVX platforms
> 
>  app/test/Makefile  |   6 +
>  app/test/test_memcpy.c |  52 +-
>  app/test/test_memcpy_perf.c| 220 ---
>  .../common/include/arch/x86/rte_memcpy.h   | 680
> +++--
>  4 files changed, 654 insertions(+), 304 deletions(-)
> 
> --
> 1.9.3

Acked-by: Pablo de Lara

[dpdk-dev] [PATCH v7 04/19] eal: fix wrong strnlen() return value in 32bit icc

2015-02-16 Thread Olivier MATZ

Hi,

On 02/15/2015 04:15 AM, Cunming Liang wrote:
> The problem is that strnlen() here may return invalid value with 32bit icc.
> (actually it returns it?s second parameter,e.g: sysconf(_SC_ARG_MAX)).
> It starts to manifest hwen max_len parameter is > 2M and using icc ?m32 ?O2 
> (or above).
> 
> Suggested-by: Konstantin Ananyev 
> Signed-off-by: Cunming Liang 

Sorry but I don't think using strnlen() is appropriate here. See
http://dpdk.org/ml/archives/dev/2015-February/013309.html

Regards,
Olivier

[dpdk-dev] [PATCH v6 04/19] eal: fix wrong strnlen() return value in 32bit icc

2015-02-16 Thread Olivier MATZ

Hi,

On 02/15/2015 02:32 AM, Liang, Cunming wrote:
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -167,7 +167,7 @@ eal_parse_coremask(const char *coremask)
>   if (coremask[0] == '0' && ((coremask[1] == 'x')
>   || (coremask[1] == 'X')))
>   coremask += 2;
> - i = strnlen(coremask, PATH_MAX);
> + i = strlen(coremask);
 This is crash prone.  If coremask is passed in without a trailing null 
 pointer,
 strlen will return a huge value that can overrun the array.
>>>
>>> We discussed that in a previous thread:
>>> http://dpdk.org/ml/archives/dev/2015-February/012552.html
>>>
>>> coremask is always a valid nul-terminated string as it comes from
>>> argv[] table.
>>> It is not a memory fragment that is controlled by a user, so I don't
>>> think using strnlen() instead of strlen() would solve any issue.
>>>
>> Thats absolutely false,  you can't in any way make that assertion.
>> eal_parse_common_option is a public API call.  An application can construct 
>> its
>> own string to pass into the parser.  The test applications all use the 
>> command
>> line functions so its not a visible issue from the test apps, but you can't
>> assume what the test apps do is what everyone will do.  It would be one 
>> thing if
>> you could make the parse_common_option function private, but with the
>> current
>> layout you can't so you have to be ready for garbage input.
>>
>> Neil
> [LCM] It sounds reasonable to me. I'll rollback the code and use 
> strnlen(coremask, ARG_MAX) instead.

I still don't agree that we should use strnlen(coremask, ARG_MAX).

The API of eal_parse_coremask() requires that a valid string is passed
as an argument, so strlen() is perfectly fine. It's up to the caller to
ensure that the string is valid.

Using strnlen(coremask, ARG_MAX) in eal_parse_coremask() with an
arbitrary length does not protect from having a segfault in case the
string is invalid and the caller's buffer length is < ARG_MAX.

This would still be true even if eal_parse_coremask() is public.

Regards,
Olivier

[dpdk-dev] [PATCH 0/2] Removal of RTE_MBUF_REFCNT

2015-02-16 Thread Stephen Hemminger

On Mon, 16 Feb 2015 16:08:31 +
Sergio Gonzalez Monroy  wrote:

> This patch tries to remove the RTE_MBUF_REFCNT config options and dependencies
> by introducing a new mbuf flag IND_ATTACHED_MBUF that would indicate when the 
> mbuf
> is an indirect attached mbuf, to differentiate between indirect mbufs and 
> mbufs
> with external memory buffers (ie. vhost zero copy).
> 
> Previous discussion:
> http://dpdk.org/ml/archives/dev/2014-October/007127.html
> 
> Currently for mbufs with refcnt, we cannot free mbufs with external memory
> buffers (ie. vhost zero copy), as they are recognized as indirect
> attached mbufs and therefore we free the direct mbuf it points to,
> resulting in an error in the case of external memory buffers.
> 
> We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
> that the mbuf is an indirect attached mbuf pointing to another mbuf.
> When we free an mbuf, we only free the direct mbuf if the flag is set.
> Freeing an mbuf with external buffer is the same as freeing a non attached 
> mbuf.
> The flag is set during attach and clear on detach.
> 
> So in the case of vhost zero copy where we have mbufs with external
> buffers, by default we just free the mbuf and it is up to the user to deal 
> with
> the external buffer.
> 
> Sergio Gonzalez Monroy (2):
>   mbuf: Introduce IND_ATTACHED_MBUF flag
>   Remove RTE_MBUF_REFCNT references
> 
>  app/test/test_link_bonding.c| 15 ---
>  app/test/test_mbuf.c| 17 +++--
>  config/common_bsdapp|  1 -
>  config/common_linuxapp  |  1 -
>  examples/Makefile   |  4 +--
>  examples/ip_fragmentation/Makefile  |  4 ---
>  examples/ip_pipeline/Makefile   |  3 ---
>  examples/ip_pipeline/main.c |  5 
>  examples/ipv4_multicast/Makefile|  4 ---
>  examples/vhost/main.c   | 19 +++---
>  lib/librte_ip_frag/Makefile |  4 ---
>  lib/librte_ip_frag/rte_ip_frag.h|  4 ---
>  lib/librte_mbuf/rte_mbuf.c  |  2 --
>  lib/librte_mbuf/rte_mbuf.h  | 45 
> +++--
>  lib/librte_pmd_bond/Makefile|  4 ---
>  lib/librte_pmd_bond/rte_eth_bond.h  |  2 --
>  lib/librte_pmd_bond/rte_eth_bond_args.c |  2 --
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c  | 10 
>  lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c   |  8 --
>  lib/librte_port/Makefile|  4 ---
>  20 files changed, 19 insertions(+), 139 deletions(-)
> 

What about supporting a clone operation instead of and in addition
to attach?  The refcnt is also useful when there are two paths for
a packet (going into mulitple rings).

[dpdk-dev] i40e and RSS woes

2015-02-16 Thread Gleb Natapov

I have an application that works reasonably well with ixgbe driver, but
when I try to use it with i40e I encounter various RSS related issues.

First one is that for some reason i40e, when it builds default reta
table, round down number of queues to power of two. Why is this? If I
configure reta by my own using all of the queues everything seams to be
working. To add insult to injury I do not get any errors during
configuration some queues just do not receive any traffic.

The second problem is that for some reason i40e does not use 40 byte
toeplitz hash key like any other driver, but it expects the key to be 52
bytes. And it would have being fine (if we ignore the fact that it
contradicts MS spec), but how my high level code suppose to know that?
And again, device configuration does not fail when wrong key length is
provided, it just uses some other key. Guys this kind of error handling
is completely unacceptable.

The last one is more of a question. Why interface to change RSS hash
function (XOR or toeplitz) is part of a filter configuration and not rss
config?

--
Gleb.

[dpdk-dev] [PATCH v1] doc: prog guide update for eal multi-pthread

2015-02-16 Thread Cunming Liang

The patch add the multi-pthread section under EAL chapter of prog_guide.

Signed-off-by: Cunming Liang 
---
 doc/guides/prog_guide/env_abstraction_layer.rst | 157 
 1 file changed, 157 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 231e266..06bcfae 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -212,4 +212,161 @@ Memory zones can be reserved with specific start address 
alignment by supplying
 The alignment value should be a power of two and not less than the cache line 
size (64 bytes).
 Memory zones can also be reserved from either 2 MB or 1 GB hugepages, provided 
that both are available on the system.

+
+Multiple pthread
+
+
+DPDK usually pin one pthread per core to avoid task switch overhead. It gains
+performance a lot, but it's not flexible and not always efficient.
+
+Power management helps to improve the cpu efficient by limiting the cpu 
runtime frequency.
+But there's more reasonable motivation to utilize the ineffective idle cycles 
under the full capability of cpu.
+
+By OS scheduing and cgroup, to each pthread on specified cpu, it can simply 
assign the cpu quota.
+It gives another way to improve the cpu efficiency. But the prerequisite is to 
run DPDK execution conext from multiple pthread on one core.
+
+For flexibility, it's also useful to allow the pthread affinity not only to a 
cpu but to a cpu set.
+
+
+EAL pthread and lcore Affinity
+~~
+
+In terms of lcore, it stands for an EAL execution unit in the EAL pthread.
+EAL pthread indicates all the pthreads created/managed by EAL, they execute 
the tasks issued by *remote_launch*.
+In each EAL pthread, there's a TLS called *_lcore_id* for the unique 
identification.
+As EAL pthreads usually 1:1 bind to the physical cpu, *_lcore_id* typically 
equals to the cpu id.
+
+In multiple pthread case, EAL pthread is no longer always bind to one specific 
physical cpu.
+It may affinity to a cpuset. Then the *_lcore_id* won't always be the same as 
cpu id.
+So there's an EAL long option '--lcores' defined to assign the cpu affinity of 
lcores.
+For a specified lcore id or id group, it allows to set the cpuset for that EAL 
pthread.
+
+The format pattern:
+   --lcores='[@cpu_set][,[@cpu_set],...]'
+
+'lcore_set' and 'cpu_set' can be a single number, range or a group.
+
+A number is a "digit([0-9]+)"; a range is "-"; a group is 
"([,,...])".
+
+If not supply a '\@cpu_set', the value of 'cpu_set' uses the same value as 
'lcore_set'.
+
+::
+
+   For example, "--lcores='1,2@(5-7),(3-5)@(0,2),(0,6),7-8'" which means 
start 9 EAL thread;
+   lcore 0 runs on cpuset 0x41 (cpu 0,6);
+   lcore 1 runs on cpuset 0x2 (cpu 1);
+   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7);
+   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2);
+   lcore 6 runs on cpuset 0x41 (cpu 0,6);
+   lcore 7 runs on cpuset 0x80 (cpu 7);
+   lcore 8 runs on cpuset 0x100 (cpu 8).
+
+By this option, for each given lcore id, the associated cpus can be assigned.
+It's also compatible with the pattern of corelist('-l') option.
+
+non-EAL pthread support
+~~~
+
+It allows to use DPDK execution context in any user pthread(aka. non-EAL 
pthread).
+
+In a non-EAL pthread, the *_lcore_id* is always LCORE_ID_ANY which means it's 
not an EAL thread along with a valid *_lcore_id*.
+Then the libraries won't take *_lcore_id* as unique id. Instead of it, some 
libraries use another alternative unique id(e.g. tid);
+some are totaly no impact; and some work with some limitation(e.g. timer, 
mempool).
+
+All these impacts are mentioned in :ref:`known_issue_label` section.
+
+Public Thread API
+~
+
+There are two public API ``rte_thread_set_affinity()`` and 
``rte_pthread_get_affinity()`` introduced for threads.
+When they're used in any pthread context, the Thread Local Storage(TLS) will 
be set/get.
+
+Those TLS include *_cpuset* and *_socket_id*:
+
+*  *_cpuset* stores the cpus bitmap to which the pthread affinity.
+
+*  *_socket_id* stores the NUMA node of the cpuset. If the cpus in cpuset 
belong to different NUMA node, the *_socket_id* set to SOCKTE_ID_ANY.
+
+
+.. _known_issue_label:
+
+Known Issues
+
+
++ rte_mempool
+
+  The rte_mempool uses a per-lcore cache inside mempool.
+  For non-EAL pthread, ``rte_lcore_id()`` will not return a valid number.
+  So for now, when rte_mempool is used in non-EAL pthread, the put/get 
operations will bypass the mempool cache.
+  There's performance penalty if bypassing the mempool cache. The work for 
none-EAL mempool cache support is in progress.
+
+  However, there's another problem. The rte_mempool is not preemptable. This 
comes from rte_ring.
+
++ rte_ring
+
+  rte_ring supports multi-producer enqueue and multi-consumer dequeue. Bu

[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-16 Thread Olivier MATZ

Hi John,

On 02/13/2015 04:39 PM, John McNamara wrote:
> From: Richardson, Bruce 
> 
> Example showing how callbacks can be used to insert a timestamp
> into each packet on RX. On TX the timestamp is used to calculate
> the packet latency through the app, in cycles.
> 
> Signed-off-by: Bruce Richardson 


I'm looking at the example and I don't understand what is the advantage
of having callbacks in ethdev layer, knowing that the application can
do the same job by a standard function call.

What is the advantage of having callbacks compared to:


for (port = 0; port < nb_ports; port++) {
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
add_timestamp(bufs, nb_rx);

const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
bufs, nb_rx);
calc_latency(bufs, nb_tx);

if (unlikely(nb_tx < nb_rx)) {
uint16_t buf;
for (buf = nb_tx; buf < nb_rx; buf++)
rte_pktmbuf_free(bufs[buf]);
}
}


To me, doing like the code above has several advantages:

- code is more readable: the callback is explicitly invoked, so there is
  no risk to forget it
- code is faster: the functions calls can be inlined by the compiler
- easier to handle error cases in the callback function as the error
  code is accessible to the application
- there is no need to add code in ethdev api to do this
- if the application does not want to use callbacks (I suppose most
  applications), it won't have any performance impact

Regards,
Olivier

[dpdk-dev] [PATCH] Add support to read target/generic/rte.vars.mk file for external builds

2015-02-16 Thread Olivier MATZ

Hi Keith,

On 02/14/2015 07:52 PM, Keith Wiles wrote:
> The external build of applications does not import the 
> target/generic/rte.vars.mk
> file, which is needed for locating DPDK headers and libraries.
> 
> Signed-off-by: Keith Wiles 
> ---
>  mk/rte.extvars.mk | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mk/rte.extvars.mk b/mk/rte.extvars.mk
> index 49fc9f2..2811ff9 100644
> --- a/mk/rte.extvars.mk
> +++ b/mk/rte.extvars.mk
> @@ -82,3 +82,10 @@ RTE_MACHINE := $(CONFIG_RTE_MACHINE:"%"=%)
>  RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%)
>  RTE_TOOLCHAIN := $(CONFIG_RTE_TOOLCHAIN:"%"=%)
>  RTE_MK_EXT := $(CONFIG_RTE_MK_EXT:"%"=%)
> +
> +ifneq ($(wildcard $(RTE_SDK)/mk/target/$(RTE_TARGET)/rte.vars.mk),)
> +include $(RTE_SDK)/mk/target/$(RTE_TARGET)/rte.vars.mk
> +else
> +include $(RTE_SDK)/mk/target/generic/rte.vars.mk
> +endif
> +
> 

Same comment than for the other patch.
The external apps should include "rte.vars.mk" as in examples.

[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-16 Thread Bruce Richardson

On Mon, Feb 16, 2015 at 03:33:40PM +0100, Olivier MATZ wrote:
> Hi John,
> 
> On 02/13/2015 04:39 PM, John McNamara wrote:
> > From: Richardson, Bruce 
> > 
> > Example showing how callbacks can be used to insert a timestamp
> > into each packet on RX. On TX the timestamp is used to calculate
> > the packet latency through the app, in cycles.
> > 
> > Signed-off-by: Bruce Richardson 
> 
> 
> I'm looking at the example and I don't understand what is the advantage
> of having callbacks in ethdev layer, knowing that the application can
> do the same job by a standard function call.
> 
> What is the advantage of having callbacks compared to:
> 
> 
> for (port = 0; port < nb_ports; port++) {
>   struct rte_mbuf *bufs[BURST_SIZE];
>   const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
>   bufs, BURST_SIZE);
>   if (unlikely(nb_rx == 0))
>   continue;
>   add_timestamp(bufs, nb_rx);
> 
>   const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
>   bufs, nb_rx);
>   calc_latency(bufs, nb_tx);
> 
>   if (unlikely(nb_tx < nb_rx)) {
>   uint16_t buf;
>   for (buf = nb_tx; buf < nb_rx; buf++)
>   rte_pktmbuf_free(bufs[buf]);
>   }
> }
> 
> 
> To me, doing like the code above has several advantages:
> 
> - code is more readable: the callback is explicitly invoked, so there is
>   no risk to forget it
> - code is faster: the functions calls can be inlined by the compiler
> - easier to handle error cases in the callback function as the error
>   code is accessible to the application
> - there is no need to add code in ethdev api to do this
> - if the application does not want to use callbacks (I suppose most
>   applications), it won't have any performance impact
> 
> Regards,
> Olivier

In this specific instance, given that the application does little else, there
is no real advantage to using the callbacks - it's just to have a simple example
of how they can be used.

Where callbacks are really designed to be useful, is for extending or augmenting
hardware capabilities. Taking the example of sequence numbers - to use the most
trivial example - an application could be written to take advantage of sequence
numbers written to packets by the hardware which received them. However, if such
an application was to be used with a NIC which does not provide sequence 
numbering
capability, for example, anything using ixgbe driver, the application writer has
two choices - either modify his application code to check each packet for
a sequence number in the data path, and add it there post-rx, or alternatively,
to check the NIC capabilities at initialization time, and add a callback there
at initialization, if the hardware does not support it. In the latter case,
the main packet processing body of the application can be written as though
hardware always has sequence numbering capability, safe in the knowledge that
any hardware not supporting it will be back-filled by a software fallback at 
initialization-time.

By the same token, we could also look to extend hardware capabilities. For
different filtering or hashing capabilities, there can be limits in hardware
which are far less than what we need to use in software. Again, callbacks will
allow the data path to be written in a way that is oblivious to the underlying
hardware limits, because software will transparently fill in the gaps.

Hope this makes the use case clear.

Regards,
/Bruce

[dpdk-dev] [PATCH] Add Q variable to external builds to be quite

2015-02-16 Thread Olivier MATZ

Hi Keith,

On 02/14/2015 07:13 PM, Keith Wiles wrote:
> Signed-off-by: Keith Wiles 
> ---
>  mk/rte.extvars.mk | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mk/rte.extvars.mk b/mk/rte.extvars.mk
> index 3e5a990..83a5721 100644
> --- a/mk/rte.extvars.mk
> +++ b/mk/rte.extvars.mk
> @@ -66,6 +66,10 @@ endif
>  RTE_OUTPUT ?= $(RTE_SRCDIR)/build
>  export RTE_OUTPUT
>  
> +# define Q to '@' or not. $(Q) is used to prefix all shell commands to
> +# be executed silently.
> +Q=@
> +
>  # if we are building an external application, include SDK
>  # configuration and include project configuration if any
>  include $(RTE_SDK_BIN)/.config
> 

In the examples/ directory, rte.extvars.mk is never included directly.
They use rte.vars.mk, which already properly defines the $(Q) variable
(its value depends on V= argument).

So I think we don't need this patch.

Regards,
Olivier

[dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to wildcard

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to wildcard
> 
> Changed all image.svg and image.png extensions to image.* This allows Sphinx 
> to decide the
> appropriate image type from the available image options.
> 
> Signed-off-by: John McNamara 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v4 4/5] doc: Refactored split cell formatting in one table

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 4/5] doc: Refactored split cell formatting in 
> one table
> 
> Refactored split cell in test_pipeline table to allow it to convert cleanly 
> to PDF.
> 
> The Sphinx/Latex converter doesn't handle split cells like the
> following:
> 
>   +-+--+
>   | Header 1| Header 2 |
>   +=+==+
>   | |  |
>   | |  |
>   +-+  |
>   | |  |
>   | |  |
>   +-+--+
> 
> Instead the table was refactored to a simpler format:
> 
>   +-+--+
>   | Header 1| Header 2 |
>   +=+==+
>   | |  |
>   | |  |
>   +-+--+
>   | |  |
>   | |  |
>   +-+--+
> 
> The same information was retained in the table.
> 
> Signed-off-by: John McNamara 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v4 3/5] doc: Fix encoding of (r) character

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 3/5] doc: Fix encoding of (r) character
> 
> Change encoding of (r) from Latin-1 to UTF8 to match the other symbols in the 
> doc and to allow it to
> convert cleanly to PDF.
> 
> Signed-off-by: John McNamara 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v4 2/5] doc: Add Sphinx config to build pdf version of guides

2015-02-16 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 2/5] doc: Add Sphinx config to build pdf 
> version of guides
> 
> Add Python Sphinx config to allow conversion of guides to Latex and then PDF 
> format.
> 
> This mainly adds metadata but also includes an override to the Latex 
> formatter to control the font size
> in code blocks.
> 
> Signed-off-by: John McNamara 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to convert guide docs to pdf

2015-02-16 Thread Iremonger, Bernard

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> > Sent: Tuesday, February 3, 2015 2:11 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to
> > convert guide docs to pdf
> >
> > Added make system support for building PDF versions of the guides.
> > Requires Python Sphinx and TexLive Full.
> >
> > Signed-off-by: John McNamara 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH] mk: fix missing link of librte_vhost in shared, non-combined config

2015-02-16 Thread Panu Matilainen

On 02/16/2015 01:17 PM, Thomas Monjalon wrote:
> 2015-02-16 12:01, Panu Matilainen:
>> On 02/13/2015 03:18 PM, Thomas Monjalon wrote:
>>> 2015-02-13 12:33, Panu Matilainen:
 On 02/13/2015 11:28 AM, Thomas Monjalon wrote:
> 2015-02-13 09:27, Panu Matilainen:
>> On 02/12/2015 05:44 PM, Thomas Monjalon wrote:
>>> A library is considered as a plugin if there is no public API and it
>>> registers itself. That's the case of normal PMD.
>>> But bonding and Xen have some library parts with public API.
>>> It has been discussed and agreed for bonding but I'm not aware of the 
>>> Xen case.
>>
>> Fair enough, thanks for the explanation.
>>
>> Just wondering about versioning of these things - currently all the PMDs
>> are versioned as well, which is slightly at odds with their expected
>> usage, dlopen()'ed items usually are not versioned because it makes the
>> files moving targets. But if a plugin can be an library too then it
>> clearly needs to be versioned as well.
>
> Not sure to understand your considerations.
> Plugins must be versioned because there can be some incompatibilities
> like mbuf rework.

 Plugins are version-dependent obviously, but the issue is somewhat
 different from library versioning. Plugins are generally consumers of
 the versioned ABIs, whereas libraries are the providers.

>> I'm just thinking of typical packaging where the unversioned *.so
>> symlinks are in a -devel subpackage and the versioned libraries are in
>> the main runtime package. Plugins should be loadable by a stable
>> unversioned name always, for libraries the linker handles it behind the
>> scenes. So in packaging these things, plugin *.so links need to be
>> handled differently (placed into the main package) from others. Not
>> rocket science to filter by 'pmd' in the name, but a new twist anyway
>> and easy to get wrong.
>>
>> One possibility to make it all more obvious might be having a separate
>> directory for plugins, the mixed case ccould be handled by symlinks.
>
> I think I don't understand which use case you are trying to solve.

 Its a usability/documentation issue more than a technical one. If plugin
 DSO's are versioned (like they currently are), then loading them via eg
 -d becomes cumbersome since you need to hunt down and provide the
 versioned name, eg "testpmd -d librte_pmd_pcap.so.1 [...]"

 Like said above, it can be worked around by leaving the unversioned
 symlinks in place for plugins in runtime (library) packages, but that
 sort of voids the point of versioning. One possibility would be
 introducing a per-version plugin directory that would be used as the
 default path for dlopen() unless an absolute path is used.
>>>
>>> It makes me think that instead of using a -d option per plugin, why not
>>> adding a -D option to load all plugins from a directory?
>>
>> Are you thinking of "-D " or just -D (to use a build-time
>> hardwired directory)?
>
> I'm thinking of "-D ".
> I understand you would like a "hardwired" default directory which would be
> properly packaged by a distribution. Maybe that it could be a build-time
> default to load all the plugins of a directory (without option). Then the
> -d and -D options would overwrite the build-time default behaviour.

Hmm, indeed. What I generally want is software to just DTRT when at all 
possible. For plugins, that typically means "load all installed/enabled 
plugins automatically unless manually overridden".

This becomes even more of an issue if/when the "combine everything" 
libintel_dpdk library in its current form is eliminated (I am fully in 
favor of that) since that has practically hidden the plugins from its 
users like openvswitch.

- Panu -



- Panu -

[dpdk-dev] [PULL REQUEST] fm10k: new polling mode driver for PF/VF.

2015-02-16 Thread Thomas Monjalon

Hi,

2015-02-16 18:18, Chen Jing D:
> These changes add poll mode driver for the host interface of Intel
> Ethernet Switch FM1 Series of silicons, which integrate NIC and
> switch functionalities. The patch set include below features:
> 
> 1. Basic RX/TX functions for PF/VF.
> 2. Interrupt handling mechanism for PF/VF.
> 3. per queue start/stop functions for PF/VF.
> 4. Mailbox handling between PF/VF and PF/Switch Manager.
> 5. Receive Side Scaling (RSS) for PF/VF.
> 6. Scatter receive function for PF/VF.
> 7. reta update/query for PF/VF.
> 8. VLAN filter set for PF.
> 9. Link status query for PF/VF.
> 
> The following changes since commit f2c5125a686ab64034925dabafea0877d1e5857e:
> 
>   app/testpmd: use default Rx/Tx port configuration (2015-02-14 11:35:25 
> +0100)
> 
> are available in the git repository at:
> 
>   jing at dpdk.org:dpdk-fm10k-next.git master
> 
> for you to fetch changes up to 1b073a75d5e809f10c0a71cbc755b02045bf8783:
> 
>   fm10k: Add ABI version of librte_pmd_fm10k (2015-02-16 03:46:00 -0500)

It seems you are requesting to pull the v5, right?
I think there were some comments from David which are not adressed.

Thanks for checking them

[dpdk-dev] [PATCH v8 2/2] librte_pmd_null: Support port hotplug function

2015-02-16 Thread Tetsuya Mukawa

This patch adds port hotplug support to Null PMD.

v7:
 - Add parameter checkings.
   (Thanks to Iremonger, Bernard)
v6:
 - Fix a parameter of rte_eth_dev_free().
v4:
 - Fix commit title.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_pmd_null/rte_eth_null.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/lib/librte_pmd_null/rte_eth_null.c 
b/lib/librte_pmd_null/rte_eth_null.c
index 779db63..1bc4ef3 100644
--- a/lib/librte_pmd_null/rte_eth_null.c
+++ b/lib/librte_pmd_null/rte_eth_null.c
@@ -336,6 +336,13 @@ eth_stats_reset(struct rte_eth_dev *dev)
}
 }

+static struct eth_driver rte_null_pmd = {
+   .pci_drv = {
+   .name = "rte_null_pmd",
+   .drv_flags = RTE_PCI_DRV_DETACHABLE,
+   },
+};
+
 static void
 eth_queue_release(void *q)
 {
@@ -429,10 +436,12 @@ eth_dev_null_create(const char *name,
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
data->mac_addrs = ð_addr;
+   strncpy(data->name, eth_dev->data->name, strlen(eth_dev->data->name));

eth_dev->data = data;
eth_dev->dev_ops = &ops;
eth_dev->pci_dev = pci_dev;
+   eth_dev->driver = &rte_null_pmd;

/* finally assign rx and tx ops */
if (packet_copy) {
@@ -532,10 +541,36 @@ rte_pmd_null_devinit(const char *name, const char *params)
return eth_dev_null_create(name, numa_node, packet_size, packet_copy);
 }

+static int
+rte_pmd_null_devuninit(const char *name, const char *params __rte_unused)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   RTE_LOG(INFO, PMD, "Closing null ethdev on numa socket %u\n",
+   rte_socket_id());
+
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->pci_dev);
+
+   rte_eth_dev_free(eth_dev);
+
+   return 0;
+}
+
 static struct rte_driver pmd_null_drv = {
.name = "eth_null",
.type = PMD_VDEV,
.init = rte_pmd_null_devinit,
+   .uninit = rte_pmd_null_devuninit,
 };

 PMD_REGISTER_DRIVER(pmd_null_drv);
-- 
1.9.1

[dpdk-dev] [PATCH v8 1/2] librte_pmd_null: Add Null PMD

2015-02-16 Thread Tetsuya Mukawa

Null PMD is a driver of the virtual device particularly designed to measure
performance of DPDK PMDs. When an application call rx, Null PMD just allocates
mbufs and returns those. Also tx, the PMD just frees mbufs.

The PMD has following options.
- size: specify packe size allocated by RX. Default packet size is 64.
- copy: specify 1 or 0 to enable or disable copy while RX and TX.
Default value is 0(disabled).
This option is used for emulating more realistic data transfer.
Copy size is equal to packet size.

To use the PMD, enable CONFIG_RTE_BUILD_SHARED_LIB in config file. Then
compile the PMD as shared library. The library can be linked using '-d'
option when an application invokes.

Here is an example.
$ sudo ./testpmd -c f -n 4 -d librte_pmd_null.so \
--vdev 'eth_null0' --vdev 'eth_null1' -- -i --no-flush-rx

If testpmd is compiled with CONFIG_RTE_BUILD_SHARED_LIB, it may need to
specify more libraries using '-d' option.

v8:
 - Fix Makefile and add version map file.
   (Thanks to Qiu, Michael and Iremonger, Bernard)
v7:
 - Add parameter checkings.
   (Thanks to Iremonger, Bernard)
 - Remove needless "__rte_unused".
v4:
 - Fix memory leak.
   (Thanks to Iremonger, Bernard)

Signed-off-by: Tetsuya Mukawa 
---
 config/common_bsdapp |   5 +
 config/common_linuxapp   |   5 +
 lib/Makefile |   1 +
 lib/librte_pmd_null/Makefile |  62 +++
 lib/librte_pmd_null/rte_eth_null.c   | 541 +++
 lib/librte_pmd_null/rte_pmd_null_version.map |   4 +
 6 files changed, 618 insertions(+)
 create mode 100644 lib/librte_pmd_null/Makefile
 create mode 100644 lib/librte_pmd_null/rte_eth_null.c
 create mode 100644 lib/librte_pmd_null/rte_pmd_null_version.map

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57bacb8..8b4a684 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -224,6 +224,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y
 CONFIG_RTE_LIBRTE_PMD_BOND=y

 #
+# Compile null PMD
+#
+CONFIG_RTE_LIBRTE_PMD_NULL=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 81055f8..4ab31e8 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -237,6 +237,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_PMD_XENVIRT=n

 #
+# Compile null PMD
+#
+CONFIG_RTE_LIBRTE_PMD_NULL=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/lib/Makefile b/lib/Makefile
index d617d81..2fc098b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -53,6 +53,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += librte_pmd_null
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
diff --git a/lib/librte_pmd_null/Makefile b/lib/librte_pmd_null/Makefile
new file mode 100644
index 000..6472015
--- /dev/null
+++ b/lib/librte_pmd_null/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (C) IGEL Co.,Ltd.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of IGEL Co.,Ltd. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(R

[dpdk-dev] [PATCH v8] testpmd: Add port hotplug support

2015-02-16 Thread Tetsuya Mukawa

The patch introduces following commands.
- port attach [ident]
- port detach [port_id]
 - attach: attaching a port
 - detach: detaching a port
 - ident: pci address of physical device.
  Or device name and parameters of virtual device.
 (ex. :02:00.0, eth_pcap0,iface=eth0)
 - port_id: port identifier

v7:
- Fix doc.
  (Thanks to Iremonger, Bernard)
- Fix port checking implementation of star_port();
  (Thanks to Qiu, Michael)
v5:
- Add testpmd documentation.
  (Thanks to Iremonger, Bernard)
v4:
 - Fix strings of command help.

Signed-off-by: Tetsuya Mukawa 
---
 app/test-pmd/cmdline.c  | 133 +++
 app/test-pmd/config.c   | 116 +---
 app/test-pmd/parameters.c   |  22 ++-
 app/test-pmd/testpmd.c  | 199 +---
 app/test-pmd/testpmd.h  |  18 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  57 
 6 files changed, 415 insertions(+), 130 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b2aab40..69cf522 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -573,6 +573,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"port close (port_id|all)\n"
"Close all ports or port_id.\n\n"

+   "port attach (ident)\n"
+   "Attach physical or virtual dev by pci address or 
virtual device name\n\n"
+
+   "port detach (port_id)\n"
+   "Detach physical or virtual dev by port_id\n\n"
+
"port config (port_id|all)"
" speed (10|100|1000|1|4|auto)"
" duplex (half|full|auto)\n"
@@ -864,6 +870,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = {
},
 };

+/* *** attach a specified port *** */
+struct cmd_operate_attach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t identifier;
+};
+
+static void cmd_operate_attach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_attach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "attach"))
+   attach_port(res->identifier);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_attach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_attach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   keyword, "attach");
+cmdline_parse_token_string_t cmd_operate_attach_port_identifier =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   identifier, NULL);
+
+cmdline_parse_inst_t cmd_operate_attach_port = {
+   .f = cmd_operate_attach_port_parsed,
+   .data = NULL,
+   .help_str = "port attach identifier, "
+   "identifier: pci address or virtual dev name",
+   .tokens = {
+   (void *)&cmd_operate_attach_port_port,
+   (void *)&cmd_operate_attach_port_keyword,
+   (void *)&cmd_operate_attach_port_identifier,
+   NULL,
+   },
+};
+
+/* *** detach a specified port *** */
+struct cmd_operate_detach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   uint8_t port_id;
+};
+
+static void cmd_operate_detach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_detach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "detach"))
+   detach_port(res->port_id);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_detach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_detach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   keyword, "detach");
+cmdline_parse_token_num_t cmd_operate_detach_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_operate_detach_port = {
+   .f = cmd_operate_detach_port_parsed,
+   .data = NULL,
+   .help_str = "port detach port_id",
+   .tokens = {
+   (void *)&cmd_operate_detach_port_port,
+   (void *)&cmd_operate_detach_port_keyword,
+

[dpdk-dev] [PATCH v8] librte_pmd_pcap: Add port hotplug support

2015-02-16 Thread Tetsuya Mukawa

This patch adds finalization code to free resources allocated by the
PMD.

v6:
 - Fix a paramter of rte_eth_dev_free().
v4:
 - Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index af7fae8..5f88efd 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -498,6 +498,13 @@ static struct eth_dev_ops ops = {
.stats_reset = eth_stats_reset,
 };

+static struct eth_driver rte_pcap_pmd = {
+   .pci_drv = {
+   .name = "rte_pcap_pmd",
+   .drv_flags = RTE_PCI_DRV_DETACHABLE,
+   },
+};
+
 /*
  * Function handler that opens the pcap file for reading a stores a
  * reference of it for use it later on.
@@ -713,6 +720,10 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
if (*eth_dev == NULL)
goto error;

+   /* check length of device name */
+   if ((strlen((*eth_dev)->data->name) + 1) > sizeof(data->name))
+   goto error;
+
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in pci_driver
@@ -739,10 +750,13 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
data->mac_addrs = ð_addr;
+   strncpy(data->name,
+   (*eth_dev)->data->name, strlen((*eth_dev)->data->name));

(*eth_dev)->data = data;
(*eth_dev)->dev_ops = &ops;
(*eth_dev)->pci_dev = pci_dev;
+   (*eth_dev)->driver = &rte_pcap_pmd;

return 0;

@@ -927,10 +941,36 @@ rte_pmd_pcap_devinit(const char *name, const char *params)

 }

+static int
+rte_pmd_pcap_devuninit(const char *name, const char *params __rte_unused)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   RTE_LOG(INFO, PMD, "Closing pcap ethdev on numa socket %u\n",
+   rte_socket_id());
+
+   if (name == NULL)
+   return -1;
+
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->pci_dev);
+
+   rte_eth_dev_free(eth_dev);
+
+   return 0;
+}
+
 static struct rte_driver pmd_pcap_drv = {
.name = "eth_pcap",
.type = PMD_VDEV,
.init = rte_pmd_pcap_devinit,
+   .uninit = rte_pmd_pcap_devuninit,
 };

 PMD_REGISTER_DRIVER(pmd_pcap_drv);
-- 
1.9.1

[dpdk-dev] [PATCH v8 14/14] doc: Add port hotplug framework section to programmers guide

2015-02-16 Thread Tetsuya Mukawa

This patch adds a new section for describing port hotplug framework.

Signed-off-by: Tetsuya Mukawa 
---
 doc/guides/prog_guide/index.rst  |   1 +
 doc/guides/prog_guide/port_hotplug_framework.rst | 110 +++
 2 files changed, 111 insertions(+)
 create mode 100644 doc/guides/prog_guide/port_hotplug_framework.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 8d86dd4..428b76b 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -70,6 +70,7 @@ Programmer's Guide
 packet_classif_access_ctrl
 packet_framework
 vhost_lib
+port_hotplug_framework
 source_org
 dev_kit_build_system
 dev_kit_root_make_help
diff --git a/doc/guides/prog_guide/port_hotplug_framework.rst 
b/doc/guides/prog_guide/port_hotplug_framework.rst
new file mode 100644
index 000..355ae28
--- /dev/null
+++ b/doc/guides/prog_guide/port_hotplug_framework.rst
@@ -0,0 +1,110 @@
+..  BSD LICENSE
+Copyright(c) 2015 IGEL Co.,Ltd. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of IGEL Co.,Ltd. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Port Hotplug Framework
+==
+
+The Port Hotplug Framework provides DPDK applications with the ability to
+attach and detach ports at runtime. Because the framework depends on PMD
+implementation, the ports that PMDs cannot handle are out of scope of this
+framework. Furthermore, after detaching a port from a DPDK application, the
+framework doesn't provide a way for removing the devices from the system.
+For the ports backed by a physical NIC, the kernel will need to support PCI
+Hotplug feature.
+
+Overview
+
+
+The basic requirements of the Port Hotplug Framework are:
+
+*   DPDK applications that use the Port Hotplug Framework must manage their
+own ports.
+
+The Port Hotplug Framework is implemented to allow DPDK applications to
+manage ports. For example, when DPDK applications call the port attach
+function, the attached port number is returned. DPDK applications can
+also detach the port by port number.
+
+*   Kernel support is needed for attaching or detaching physical device
+ports.
+
+To attach new physical device ports, the device will be recognized by
+userspace driver I/O framework in kernel at first. Then DPDK
+applications can call the Port Hotplug functions to attach the ports.
+For detaching, steps are vice versa.
+
+*   Before detaching, they must be stopped and closed.
+
+DPDK applications must call "rte_eth_dev_stop()" and
+"rte_eth_dev_close()" APIs before detaching ports. These functions will
+start finalization sequence of the PMDs.
+
+*   The framework doesn't affect legacy DPDK applications behavior.
+
+If the Port Hotplug functions aren't called, all legacy DPDK apps can
+still work without modifications.
+
+Port Hotplug API overview
+-
+
+*   Attaching a port
+
+"rte_eal_dev_attach()" API attaches a port to DPDK application, and
+returns the attached port number. Before calling the API, the device
+should be recognized by an userspace driver I/O framework. The API
+receives a pci address like ":01:00.0" or a virtual device name
+like "eth_pcap0,iface=eth0". In the case of virtual device name, the
+format is the same as the general "--vdev" option of DPDK.
+
+*   Detac

[dpdk-dev] [PATCH v8 13/14] eal: Enable port hotplug framework in Linux

2015-02-16 Thread Tetsuya Mukawa

The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..81055f8 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
+# Compile Environment Abstraction Layer to support hotplug
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
1.9.1

[dpdk-dev] [PATCH v8 12/14] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-16 Thread Tetsuya Mukawa

These functions are used for attaching or detaching a port.
When rte_eal_dev_attach() is called, the function tries to realize the
device name as pci address. If this is done successfully,
rte_eal_dev_attach() will attach physical device port. If not, attaches
virtual devive port.
When rte_eal_dev_detach() is called, the function gets the device type
of this port to know whether the port is come from physical or virtual.
And then specific detaching function will be called.

v8:
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v7:
- Fix typo of warning messages.
  (Thanks to Qiu, Michael)
v5:
- Change function names like below.
  rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
  rte_eal_dev_invoke() to rte_eal_vdev_invoke().
- Add code to handle a return value of rte_eal_devargs_remove().
- Fix pci address format in rte_eal_dev_detach().
v4:
- Fix comment.
- Add error checking.
- Fix indent of 'if' statement.
- Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 276 
 lib/librte_eal/common/eal_private.h |  11 +
 lib/librte_eal/common/include/rte_dev.h |  33 +++
 lib/librte_eal/linuxapp/eal/Makefile|   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   6 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   2 +
 6 files changed, 326 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..3d169a4 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -32,10 +32,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -107,3 +110,276 @@ rte_eal_dev_init(void)
}
return 0;
 }
+
+/* So far, DPDK hotplug function only supports linux */
+#ifdef ENABLE_HOTPLUG
+static void
+rte_eal_vdev_invoke(struct rte_driver *driver,
+   struct rte_devargs *devargs, enum rte_eal_invoke_type type)
+{
+   if ((driver == NULL) || (devargs == NULL))
+   return;
+
+   switch (type) {
+   case RTE_EAL_INVOKE_TYPE_PROBE:
+   driver->init(devargs->virtual.drv_name, devargs->args);
+   break;
+   case RTE_EAL_INVOKE_TYPE_CLOSE:
+   driver->uninit(devargs->virtual.drv_name, devargs->args);
+   break;
+   default:
+   break;
+   }
+}
+
+static int
+rte_eal_vdev_find_and_invoke(const char *name, int type)
+{
+   struct rte_devargs *devargs;
+   struct rte_driver *driver;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   /* call the init function for each virtual device */
+   TAILQ_FOREACH(devargs, &devargs_list, next) {
+
+   if (devargs->type != RTE_DEVTYPE_VIRTUAL)
+   continue;
+
+   if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
+   continue;
+
+   TAILQ_FOREACH(driver, &dev_driver_list, next) {
+   if (driver->type != PMD_VDEV)
+   continue;
+
+   /* search a driver prefix in virtual device name */
+   if (!strncmp(driver->name, devargs->virtual.drv_name,
+   strlen(driver->name))) {
+   rte_eal_vdev_invoke(driver, devargs, type);
+   break;
+   }
+   }
+
+   if (driver == NULL) {
+   RTE_LOG(WARNING, EAL, "no driver found for %s\n",
+ devargs->virtual.drv_name);
+   }
+   return 0;
+   }
+   return 1;
+}
+
+/* attach the new physical device, then store port_id of the device */
+static int
+rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   uint8_t new_port_id;
+   struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
+
+   if ((addr == NULL) || (port_id == NULL))
+   goto err;
+
+   /* save current port status */
+   if (rte_eth_dev_save(devs, sizeof(devs)))
+   goto err;
+   /* re-construct pci_device_list */
+   if (rte_eal_pci_scan())
+   goto err;
+   /* invoke probe func of the driver can handle the new device */
+   if (rte_eal_pci_probe_one(addr))
+   goto err;
+   /* get port_id enabled by above procedures */
+   if (rte_eth_dev_get_changed_port(devs, &new_port_id))
+   goto err;
+
+   *port_id = new_port_id;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n");
+   return -1;
+}
+
+/* detach the new physical device, then store pci_addr of the device */
+static int
+rte_eal_dev_detach_pdev(uint8_t port_id, st

[dpdk-dev] [PATCH v8 11/14] ethdev: Add one dev_type parameter to rte_eth_dev_allocate

2015-02-16 Thread Tetsuya Mukawa

This new parameter is needed to keep device type like physical or virtual.
Port detaching processes are different between physical and virtual.
This parameter lets detaching function know a device type of the port.

v8:
- NONE_TRACE is replaced by NO_TRACE.
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v4:
- Fix comments of rte_eth_dev_type.

Signed-off-by: Tetsuya Mukawa 
---
 app/test/virtual_pmd.c   |  2 +-
 lib/librte_ether/rte_ethdev.c| 25 +++--
 lib/librte_ether/rte_ethdev.h| 25 -
 lib/librte_ether/rte_ether_version.map   |  1 +
 lib/librte_pmd_af_packet/rte_eth_af_packet.c |  2 +-
 lib/librte_pmd_bond/rte_eth_bond_api.c   |  2 +-
 lib/librte_pmd_pcap/rte_eth_pcap.c   |  2 +-
 lib/librte_pmd_ring/rte_eth_ring.c   |  2 +-
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c |  2 +-
 9 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index 9fac95d..8d3a5ff 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -556,7 +556,7 @@ virtual_ethdev_create(const char *name, struct ether_addr 
*mac_addr,
goto err;

/* reserve an ethdev entry */
-   eth_dev = rte_eth_dev_allocate(name);
+   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PHYSICAL);
if (eth_dev == NULL)
goto err;

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 3869a96..70c4589 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -232,7 +232,7 @@ rte_eth_dev_allocate_new_port(void)
 }

 struct rte_eth_dev *
-rte_eth_dev_allocate(const char *name)
+rte_eth_dev_allocate(const char *name, enum rte_eth_dev_type type)
 {
uint8_t port_id;
struct rte_eth_dev *eth_dev;
@@ -256,6 +256,7 @@ rte_eth_dev_allocate(const char *name)
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
eth_dev->data->port_id = port_id;
eth_dev->attached = DEV_CONNECTED;
+   eth_dev->dev_type = type;
nb_ports++;
return eth_dev;
 }
@@ -267,6 +268,7 @@ rte_eth_dev_free(struct rte_eth_dev *eth_dev)
return -EINVAL;

eth_dev->attached = 0;
+   eth_dev->dev_type = RTE_ETH_DEV_UNKNOWN;
nb_ports--;
return 0;
 }
@@ -287,7 +289,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
pci_dev->addr.bus, pci_dev->addr.devid, 
pci_dev->addr.function);

-   eth_dev = rte_eth_dev_allocate(ethdev_name);
+   eth_dev = rte_eth_dev_allocate(ethdev_name, RTE_ETH_DEV_PHYSICAL);
if (eth_dev == NULL)
return -ENOMEM;

@@ -426,6 +428,14 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+enum rte_eth_dev_type
+rte_eth_dev_get_device_type(uint8_t port_id)
+{
+   if (rte_eth_dev_validate_port(port_id, NO_TRACE) == DEV_INVALID)
+   return -1;
+   return rte_eth_devices[port_id].dev_type;
+}
+
 int
 rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
 {
@@ -519,6 +529,17 @@ rte_eth_dev_check_detachable(uint8_t port_id)
return -EINVAL;
}

+   if (rte_eth_devices[port_id].dev_type == RTE_ETH_DEV_PHYSICAL) {
+   switch (rte_eth_devices[port_id].pci_dev->pt_driver) {
+   case RTE_PT_IGB_UIO:
+   case RTE_PT_UIO_GENERIC:
+   break;
+   case RTE_PT_VFIO:
+   default:
+   return -ENOTSUP;
+   }
+   }
+
drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
return !(drv_flags & RTE_PCI_DRV_DETACHABLE);
 }
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 6651890..caf0e8a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1522,6 +1522,17 @@ struct eth_dev_ops {
 };

 /**
+ * The eth device type
+ */
+enum rte_eth_dev_type {
+   RTE_ETH_DEV_UNKNOWN,/**< unknown device type */
+   RTE_ETH_DEV_PHYSICAL,
+   /**< Physical function and Virtual function devices of NIC */
+   RTE_ETH_DEV_VIRTUAL,/**< non hardware device */
+   RTE_ETH_DEV_MAX /**< max value of this enum */
+};
+
+/**
  * @internal
  * The generic data structure associated with each ethernet device.
  *
@@ -1540,6 +1551,7 @@ struct rte_eth_dev {
struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
struct rte_eth_dev_cb_list callbacks; /**< User application callbacks */
uint8_t attached; /**< Flag indicating the port is attached */
+   enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
 };

 struct rte_eth_dev_sriov {
@@ -1617,6 +1629,15 @@ extern uint8_t rte_eth_dev_count(void);

 /**
  * Function for internal use b

[dpdk-dev] [PATCH v8 10/14] eal/pci: Cleanup pci driver initialization code

2015-02-16 Thread Tetsuya Mukawa

- Add rte_eal_pci_close_one_dirver()
  The function is used for closing the specified driver and device.
- Add pci_invoke_all_drivers()
  The function is based on pci_probe_all_drivers. But it can not only
  probe but also close drivers.
- Add pci_close_all_drivers()
  The function tries to find a driver for the specified device, and
  then close the driver.
- Add rte_eal_pci_probe_one() and rte_eal_pci_close_one()
  The functions are used for probe and close a device.
  First the function tries to find a device that has the specified
  PCI address. Then, probe or close the device.

v5:
- Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
v4:
- Fix parameter checking.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c  | 90 +
 lib/librte_eal/common/eal_private.h | 24 +
 lib/librte_eal/common/include/rte_pci.h | 33 
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 69 +
 4 files changed, 206 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index a89f5c3..7c9b8c5 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -99,19 +99,27 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
 static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+pci_invoke_all_drivers(struct rte_pci_device *dev,
+   enum rte_eal_invoke_type type)
 {
struct rte_pci_driver *dr = NULL;
-   int rc;
+   int rc = 0;
+
+   if ((dev == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX))
+   return -1;

TAILQ_FOREACH(dr, &pci_driver_list, next) {
-   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   switch (type) {
+   case RTE_EAL_INVOKE_TYPE_PROBE:
+   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   break;
+   case RTE_EAL_INVOKE_TYPE_CLOSE:
+   rc = rte_eal_pci_close_one_driver(dr, dev);
+   break;
+   default:
+   return -1;
+   }
if (rc < 0)
/* negative value is an error */
return -1;
@@ -123,6 +131,66 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
return 1;
 }

+#ifdef ENABLE_HOTPLUG
+static int
+rte_eal_pci_invoke_one(struct rte_pci_addr *addr,
+   enum rte_eal_invoke_type type)
+{
+   struct rte_pci_device *dev = NULL;
+   int ret = 0;
+
+   if ((addr == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX))
+   return -1;
+
+   TAILQ_FOREACH(dev, &pci_device_list, next) {
+   if (eal_compare_pci_addr(&dev->addr, addr))
+   continue;
+
+   ret = pci_invoke_all_drivers(dev, type);
+   if (ret < 0)
+   goto invoke_err_return;
+
+   if (type == RTE_EAL_INVOKE_TYPE_CLOSE)
+   goto remove_dev;
+
+   return 0;
+   }
+
+   return -1;
+
+invoke_err_return:
+   RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
+   " cannot be used\n", dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function);
+   return -1;
+
+remove_dev:
+   TAILQ_REMOVE(&pci_device_list, dev, next);
+   return 0;
+}
+
+
+/*
+ * Find the pci device specified by pci address, then invoke probe function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_probe_one(struct rte_pci_addr *addr)
+{
+   return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_PROBE);
+}
+
+/*
+ * Find the pci device specified by pci address, then invoke close function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_close_one(struct rte_pci_addr *addr)
+{
+   return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_CLOSE);
+}
+#endif /* ENABLE_HOTPLUG */
+
 /*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
@@ -148,10 +216,12 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev,
+   RTE_EAL_INVOKE_TYPE_PROBE);
else if (devargs != NULL &&
devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev,
+

[dpdk-dev] [PATCH v8 09/14] eal/pci: Add a function to remove the entry of devargs list

2015-02-16 Thread Tetsuya Mukawa

The function removes the specified devargs entry from devargs_list.
Also, the patch adds sanity checking to rte_eal_devargs_add().

v5:
- Change function definition of rte_eal_devargs_remove().
v4:
- Fix sanity check code.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_devargs.c  | 60 +
 lib/librte_eal/common/include/rte_devargs.h | 21 ++
 2 files changed, 81 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index 4c7d11a..5b1ac8e 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -44,6 +44,35 @@
 struct rte_devargs_list devargs_list =
TAILQ_HEAD_INITIALIZER(devargs_list);

+
+/* find a entry specified by pci address or device name */
+static struct rte_devargs *
+rte_eal_devargs_find(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return NULL;
+
+   TAILQ_FOREACH(devargs, &devargs_list, next) {
+   switch (devtype) {
+   case RTE_DEVTYPE_WHITELISTED_PCI:
+   case RTE_DEVTYPE_BLACKLISTED_PCI:
+   if (eal_compare_pci_addr(&devargs->pci.addr, args) == 0)
+   goto found;
+   break;
+   case RTE_DEVTYPE_VIRTUAL:
+   if (memcmp(&devargs->virtual.drv_name, args,
+   strlen((char *)args)) == 0)
+   goto found;
+   break;
+   }
+   }
+   return NULL;
+found:
+   return devargs;
+}
+
 /* store a whitelist parameter for later parsing */
 int
 rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
@@ -87,6 +116,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
free(devargs);
return -1;
}
+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, &devargs->pci.addr)) {
+   RTE_LOG(ERR, EAL,
+   "device already registered: <%s>\n", buf);
+   return -1;
+   }
break;
case RTE_DEVTYPE_VIRTUAL:
/* save driver name */
@@ -98,6 +133,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
free(devargs);
return -1;
}
+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, &devargs->virtual.drv_name)) {
+   RTE_LOG(ERR, EAL,
+   "device already registered: <%s>\n", buf);
+   return -1;
+   }
break;
}

@@ -105,6 +146,25 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
return 0;
 }

+/* remove it from the devargs_list */
+int
+rte_eal_devargs_remove(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return -EINVAL;
+
+   devargs = rte_eal_devargs_find(devtype, args);
+   if (devargs == NULL) {
+   RTE_LOG(ERR, EAL, "device not found\n");
+   return -ENODEV;
+   }
+
+   TAILQ_REMOVE(&devargs_list, devargs, next);
+   return 0;
+}
+
 /* count the number of devices of a specified type */
 unsigned int
 rte_eal_devargs_type_count(enum rte_devtype devtype)
diff --git a/lib/librte_eal/common/include/rte_devargs.h 
b/lib/librte_eal/common/include/rte_devargs.h
index 9f9c98f..6d9763b 100644
--- a/lib/librte_eal/common/include/rte_devargs.h
+++ b/lib/librte_eal/common/include/rte_devargs.h
@@ -123,6 +123,27 @@ extern struct rte_devargs_list devargs_list;
 int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str);

 /**
+ * Remove a device from the user device list
+ *
+ * For PCI devices, the format of arguments string is "PCI_ADDR". It shouldn't
+ * involve parameters for the device. Example: "08:00.1".
+ *
+ * For virtual devices, the format of arguments string is "DRIVER_NAME*". It
+ * shouldn't involve parameters for the device. Example: "eth_ring". The
+ * validity of the driver name is not checked by this function, it is done
+ * when closing the drivers.
+ *
+ * @param devtype
+ *   The type of the device.
+ * @param name
+ *   The name of the device.
+ *
+ * @return
+ *   - 0 on success, negative on error
+ */
+int rte_eal_devargs_remove(enum rte_devtype devtype, void *args);
+
+/**
  * Count the number of user devices of a specified type
  *
  * @param devtype
-- 
1.9.1

[dpdk-dev] [PATCH v8 08/14] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-02-16 Thread Tetsuya Mukawa

The patch adds functions for unmapping igb_uio resources. The patch is only
for Linux and igb_uio environment. VFIO and BSD are not supported.

v8:
- Fix typo.
  (Thanks to Iremonger, Bernard)
v5:
- Fix pci_unmap_device() to check pt_driver.
v4:
- Add parameter checking.
- Add header file to determine if hotplug can be enabled.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/Makefile  |  1 +
 lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 44 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 +
 5 files changed, 162 insertions(+)
 create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 52c1a5f..db7cc93 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
 INC += rte_common_vect.h
 INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
+INC += rte_dev_hotplug.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
b/lib/librte_eal/common/include/rte_dev_hotplug.h
new file mode 100644
index 000..b333e0f
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co.,LTd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_HOTPLUG_H_
+#define _RTE_DEV_HOTPLUG_H_
+
+/*
+ * determine if hotplug can be enabled on the system
+ */
+#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+#define ENABLE_HOTPLUG
+#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
+
+#endif /* _RTE_DEV_HOTPLUG_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 2d5f6a6..72a1362 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -167,6 +167,25 @@ pci_map_resource(void *requested_addr, int fd, off_t 
offset, size_t size)
return mapaddr;
 }

+#ifdef ENABLE_HOTPLUG
+/* unmap a particular resource */
+void
+pci_unmap_resource(void *requested_addr, size_t size)
+{
+   if (requested_addr == NULL)
+   return;
+
+   /* Unmap the PCI memory resource of device */
+   if (munmap(requested_addr, size)) {
+   RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
+   __func__, requested_addr, (unsigned long)size,
+   strerror(errno));
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory unmapped at %p\n",
+   requested_addr);
+}
+#endif /* ENABLE_HOTPLUG */
+
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x0200

@@ -579,6 +598,31 @@ pci_map_device(struct rte_pci_device *dev)
return ret;
 }

+#ifdef ENABLE_HOTPLUG
+static void
+pci_unmap_device(struct rte_pci_device *dev)
+{
+   if (dev == NULL)
+   return;
+
+   /* try unmapping the NIC resources using VFIO if it exists */
+   switch (dev->pt_driver) {
+   case RTE_PT_VFIO:
+   RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio

[dpdk-dev] [PATCH v8 07/14] ethdev: Add functions that will be used by port hotplug functions

2015-02-16 Thread Tetsuya Mukawa

The patch adds following functions.

- rte_eth_dev_save()
  The function is used for saving current rte_eth_dev structures.
- rte_eth_dev_get_changed_port()
  The function receives the rte_eth_dev structures, then compare
  these with current values to know which port is actually
  attached or detached.
- rte_eth_dev_get_addr_by_port()
  The function returns a pci address of an ethdev specified by port
  identifier.
- rte_eth_dev_get_port_by_addr()
  The function returns a port identifier of an ethdev specified by
  pci address.
- rte_eth_dev_get_name_by_port()
  The function returns a unique identifier name of an ethdev
  specified by port identifier.
- Add rte_eth_dev_check_detachable()
  The function returns whether a PMD supports detach function.

Also, the patch changes scope of rte_eth_dev_allocated() to global.
This function will be called by virtual PMDs to support port hotplug.
So change scope of the function to global.

v8:
- Add size parameter to rte_eth_dev_save().
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v7:
- Add pt_driver checking to rte_eth_dev_check_detachable().
  (Thanks to Qiu, Michael)
v5:
- Fix return value of below functions.
  rte_eth_dev_get_changed_port().
  rte_eth_dev_get_port_by_addr().
v4:
- Add parameter checking.
v3:
- Fix if-condition bug while comparing pci addresses.
- Add error checking codes.
Reported-by: Mark Enright 

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c  | 99 +-
 lib/librte_ether/rte_ethdev.h  | 83 
 lib/librte_ether/rte_ether_version.map |  6 +++
 3 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 58d8072..3869a96 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

-static struct rte_eth_dev *
+struct rte_eth_dev *
 rte_eth_dev_allocated(const char *name)
 {
unsigned i;
@@ -426,6 +426,103 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+int
+rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
+{
+   if ((devs == NULL) ||
+   (size != sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS))
+   return -EINVAL;
+
+   /* save current rte_eth_devices */
+   memcpy(devs, rte_eth_devices, size);
+   return 0;
+}
+
+int
+rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
+{
+   if ((devs == NULL) || (port_id == NULL))
+   return -EINVAL;
+
+   /* check which port was attached or detached */
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) {
+   if (rte_eth_devices[*port_id].attached ^ devs->attached)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
+   return -EINVAL;
+
+   if (addr == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   *addr = rte_eth_devices[port_id].pci_dev->addr;
+   return 0;
+}
+
+int
+rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   struct rte_pci_addr *tmp;
+
+   if ((addr == NULL) || (port_id == NULL)) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
+   if (!rte_eth_devices[*port_id].attached)
+   continue;
+   if (!rte_eth_devices[*port_id].pci_dev)
+   continue;
+   tmp = &rte_eth_devices[*port_id].pci_dev->addr;
+   if (eal_compare_pci_addr(tmp, addr) == 0)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
+{
+   char *tmp;
+
+   if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
+   return -EINVAL;
+
+   if (name == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   /* shouldn't check 'rte_eth_devices[i].data',
+* because it might be overwritten by VDEV PMD */
+   tmp = rte_eth_dev_data[port_id].name;
+   strncpy(name, tmp, strlen(tmp) + 1);
+   return 0;
+}
+
+int
+rte_eth_dev_check_detachable(uint8_t port_id)
+{
+   uint32_t drv_flags;
+
+   if (port_id >= RTE_MAX_ETHPORTS) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
+   return !(drv_flags & RTE_PCI

[dpdk-dev] [PATCH v8 06/14] eal, ethdev: Add a function and function pointers to close ether device

2015-02-16 Thread Tetsuya Mukawa

The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.
Also, the patch adds rte_eth_dev_uninit(). So far, it's not called
by anywhere, but it will be called when port hotplug function is
implemented.

v6:
- Fix rte_eth_dev_uninit() to handle a return value of uninit
  function of PMD.
v4:
- Add parameter checking.
- Change function names.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  7 +
 lib/librte_ether/rte_ethdev.c   | 47 +
 lib/librte_ether/rte_ethdev.h   | 24 +
 3 files changed, 78 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 4814cd7..87ca4cf 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -189,12 +189,19 @@ struct rte_pci_driver;
 typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *);

 /**
+ * Uninitialisation function for the driver called during hotplugging.
+ */
+typedef int (pci_devuninit_t)(
+   struct rte_pci_driver *, struct rte_pci_device *);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
TAILQ_ENTRY(rte_pci_driver) next;   /**< Next in list. */
const char *name;   /**< Driver name. */
pci_devinit_t *devinit; /**< Device init. function. */
+   pci_devuninit_t *devuninit; /**< Device uninit function. */
struct rte_pci_id *id_table;/**< ID table, NULL terminated. 
*/
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 2463d18..58d8072 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -326,6 +326,52 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
return diag;
 }

+static int
+rte_eth_dev_uninit(struct rte_pci_driver *pci_drv,
+struct rte_pci_device *pci_dev)
+{
+   struct eth_driver *eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+   int ret;
+
+   if ((pci_drv == NULL) || (pci_dev == NULL))
+   return -EINVAL;
+
+   /* Create unique Ethernet device name using PCI address */
+   snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
+   pci_dev->addr.bus, pci_dev->addr.devid,
+   pci_dev->addr.function);
+
+   eth_dev = rte_eth_dev_allocated(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENODEV;
+
+   eth_drv = (struct eth_driver *)pci_drv;
+
+   /* Invoke PMD device uninit function */
+   if (*eth_drv->eth_dev_uninit) {
+   ret = (*eth_drv->eth_dev_uninit)(eth_drv, eth_dev);
+   if (ret)
+   return ret;
+   }
+
+   /* free ether device */
+   rte_eth_dev_free(eth_dev);
+
+   /* init user callbacks */
+   TAILQ_INIT(&(eth_dev->callbacks));
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+
+   eth_dev->pci_dev = NULL;
+   eth_dev->driver = NULL;
+   eth_dev->data = NULL;
+
+   return 0;
+}
+
 /**
  * Register an Ethernet [Poll Mode] driver.
  *
@@ -344,6 +390,7 @@ void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
eth_drv->pci_drv.devinit = rte_eth_dev_init;
+   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
rte_eal_pci_register(ð_drv->pci_drv);
 }

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fbe7ac1..91d9e86 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1678,6 +1678,27 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,

 /**
  * @internal
+ * Finalization function of an Ethernet driver invoked for each matching
+ * Ethernet PCI device detected during the PCI closing phase.
+ *
+ * @param eth_drv
+ *   The pointer to the [matching] Ethernet driver structure supplied by
+ *   the PMD when it registered itself.
+ * @param eth_dev
+ *   The *eth_dev* pointer is the address of the *rte_eth_dev* structure
+ *   associated with the matching device and which have been [automatically]
+ *   allocated in the *rte_eth_devices* array.
+ * @return
+ *   - 0: Success, the device is properly finalized by the driver.
+ *In particular, the driver MUST free the *dev_ops* pointer
+ *of the *eth_dev* structure.
+ *   - <0: Error code of the device initialization failure.
+ */
+typedef int (*eth_dev_uninit_t)(struct eth_driver  *eth_drv,
+ struct rte_eth_dev *eth_dev);
+
+/**
+ * @internal
  * The structure associated with a PMD Ethernet driver.
  *
  * Each Ethernet driver acts as a PCI driver and is represen

[dpdk-dev] [PATCH v8 05/14] ethdev: Add rte_eth_dev_free to free specified device

2015-02-16 Thread Tetsuya Mukawa

This patch adds rte_eth_dev_free(). The function is used for changing an
attached status of the device that has specified name.

v6:
- Use rte_eth_dev structure as the paramter of rte_eth_dev_free().
v4:
- Add parameter checking.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 11 +++
 lib/librte_ether/rte_ethdev.h | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a79fa5b..2463d18 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -260,6 +260,17 @@ rte_eth_dev_allocate(const char *name)
return eth_dev;
 }

+int
+rte_eth_dev_free(struct rte_eth_dev *eth_dev)
+{
+   if (eth_dev == NULL)
+   return -EINVAL;
+
+   eth_dev->attached = 0;
+   nb_ports--;
+   return 0;
+}
+
 static int
 rte_eth_dev_init(struct rte_pci_driver *pci_drv,
 struct rte_pci_device *pci_dev)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ca101f5..fbe7ac1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1627,6 +1627,20 @@ extern uint8_t rte_eth_dev_count(void);
  */
 struct rte_eth_dev *rte_eth_dev_allocate(const char *name);

+/**
+ * Function for internal use by dummy drivers primarily, e.g. ring-based
+ * driver.
+ * Free the specified ethdev.
+ *
+ * @param eth_dev
+ * The *eth_dev* pointer is the address of the *rte_eth_dev* structure
+ *   associated with the matching device and which have been [automatically]
+ *   allocated in the *rte_eth_devices* array.
+ * @return
+ *   - 0 on success, negative on error
+ */
+int rte_eth_dev_free(struct rte_eth_dev *eth_dev);
+
 struct eth_driver;
 /**
  * @internal
-- 
1.9.1

[dpdk-dev] [PATCH v8 04/14] eal/pci: Consolidate pci address comparison APIs

2015-02-16 Thread Tetsuya Mukawa

This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
eal_compare_pci_addr().

v8:
- Fix pci_scan_one() to update sysfs values.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v5:
- Fix pci_scan_one to handle pt_driver correctly.
v4:
- Fix calculation method of eal_compare_pci_addr().
- Add parameter checking.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 29 --
 lib/librte_eal/common/eal_common_pci.c|  2 +-
 lib/librte_eal/common/include/rte_pci.h   | 34 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
 5 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 74ecce7..7dbdccd 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return (0);
 }

-/* Compare two PCI device addresses. */
-static int
-pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
-{
-   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
(addr->devid << 8) + addr->function;
-   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
(addr2->devid << 8) + addr2->function;
-
-   if (dev_addr > dev_addr2)
-   return 1;
-   else
-   return 0;
-}
-
-
 /* Scan one pci sysfs entry, and fill the devices list from it. */
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
@@ -356,13 +342,24 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
else {
struct rte_pci_device *dev2 = NULL;
+   int ret;

TAILQ_FOREACH(dev2, &pci_device_list, next) {
-   if (pci_addr_comparison(&dev->addr, &dev2->addr))
+   ret = eal_compare_pci_addr(&dev->addr, &dev2->addr);
+   if (ret > 0)
continue;
-   else {
+   else if (ret < 0) {
TAILQ_INSERT_BEFORE(dev2, dev, next);
return 0;
+   } else { /* already registered */
+   /* update pt_driver */
+   dev2->pt_driver = dev->pt_driver;
+   dev2->max_vfs = dev->max_vfs;
+   memmove(dev2->mem_resource,
+   dev->mem_resource,
+   sizeof(dev->mem_resource));
+   free(dev);
+   return 0;
}
}
TAILQ_INSERT_TAIL(&pci_device_list, dev, next);
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f3c7f71..a89f5c3 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
continue;
-   if (!memcmp(&dev->addr, &devargs->pci.addr, sizeof(dev->addr)))
+   if (!eal_compare_pci_addr(&dev->addr, &devargs->pci.addr))
return devargs;
}
return NULL;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 7f2d699..4814cd7 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, struct 
rte_pci_addr *dev_addr)
 }
 #undef GET_PCIADDR_FIELD

+/* Compare two PCI device addresses. */
+/**
+ * Utility function to compare two PCI device addresses.
+ *
+ * @param addr
+ * The PCI Bus-Device-Function address to compare
+ * @param addr2
+ * The PCI Bus-Device-Function address to compare
+ * @return
+ * 0 on equal PCI address.
+ * Positive on addr is greater than addr2.
+ * Negative on addr is less than addr2, or error.
+ */
+static inline int
+eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
+{
+   uint64_t dev_addr, dev_addr2;
+
+   if ((addr == NULL) || (addr2 == NULL))
+   return -1;
+
+   dev_addr = (addr->domain << 24) | (addr->bus << 16) |
+   (addr->devid << 8) | addr->function;
+   dev_addr2 = (addr2->domain << 24) | (addr2->bus << 16) |
+   (addr2->devid << 8) | addr2->function;
+
+   if (dev_addr > dev_addr2)
+   return 1;
+   else if (dev_addr < dev_addr2)
+   return -1;
+   else
+

[dpdk-dev] [PATCH v8 03/14] eal/pci, ethdev: Remove assumption that port will not be detached

2015-02-16 Thread Tetsuya Mukawa

To remove assumption, do like followings.

This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicate the driver can detach devices at runtime.
Also, remove assumption that port will not be detached.

To remove the assumption.
- Add 'attached' member to rte_eth_dev structure.
  This member is used for indicating the port is attached, or not.
- Add rte_eth_dev_allocate_new_port().
  This function is used for allocating new port.

v8:
- NONE_TRACE is changed to NO_TRACE.
  (Thanks to Iremonger, Bernard)
v5:
- Change parameters of rte_eth_dev_validate_port() to cleanup code.
v4:
- Use braces with 'for' loop.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |   2 +
 lib/librte_ether/rte_ethdev.c   | 454 +---
 lib/librte_ether/rte_ethdev.h   |   5 +
 3 files changed, 186 insertions(+), 275 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 7b48b55..7f2d699 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -207,6 +207,8 @@ struct rte_pci_driver {
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
+/** Device driver supports detaching capability */
+#define RTE_PCI_DRV_DETACHABLE 0x0010

 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ea3a1fb..a79fa5b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -175,6 +175,16 @@ enum {
STAT_QMAP_RX
 };

+enum {
+   DEV_INVALID = 0,
+   DEV_VALID,
+};
+
+enum {
+   DEV_DISCONNECTED = 0,
+   DEV_CONNECTED
+};
+
 static inline void
 rte_eth_dev_data_alloc(void)
 {
@@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)
 {
unsigned i;

-   for (i = 0; i < nb_ports; i++) {
-   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if ((rte_eth_devices[i].attached == DEV_CONNECTED) &&
+   strcmp(rte_eth_devices[i].data->name, name) == 0)
return &rte_eth_devices[i];
}
return NULL;
 }

+static uint8_t
+rte_eth_dev_allocate_new_port(void)
+{
+   unsigned i;
+
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
+   return i;
+   }
+   return RTE_MAX_ETHPORTS;
+}
+
 struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
+   uint8_t port_id;
struct rte_eth_dev *eth_dev;

-   if (nb_ports == RTE_MAX_ETHPORTS) {
+   port_id = rte_eth_dev_allocate_new_port();
+   if (port_id == RTE_MAX_ETHPORTS) {
PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
return NULL;
}
@@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
return NULL;
}

-   eth_dev = &rte_eth_devices[nb_ports];
-   eth_dev->data = &rte_eth_dev_data[nb_ports];
+   eth_dev = &rte_eth_devices[port_id];
+   eth_dev->data = &rte_eth_dev_data[port_id];
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
-   eth_dev->data->port_id = nb_ports++;
+   eth_dev->data->port_id = port_id;
+   eth_dev->attached = DEV_CONNECTED;
+   nb_ports++;
return eth_dev;
 }

@@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
(unsigned) pci_dev->id.device_id);
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
rte_free(eth_dev->data->dev_private);
+   eth_dev->attached = DEV_DISCONNECTED;
nb_ports--;
return diag;
 }
@@ -308,10 +336,28 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
rte_eal_pci_register(ð_drv->pci_drv);
 }

+enum {
+   NO_TRACE = 0,
+   TRACE
+};
+
+static int
+rte_eth_dev_validate_port(uint8_t port_id, int trace)
+{
+   if (port_id >= RTE_MAX_ETHPORTS ||
+   rte_eth_devices[port_id].attached != DEV_CONNECTED) {
+   if (trace) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   }
+   return DEV_INVALID;
+   } else
+   return DEV_VALID;
+}
+
 int
 rte_eth_dev_socket_id(uint8_t port_id)
 {
-   if (port_id >= nb_ports)
+   if (rte_eth_dev_validate_port(port_id, NO_TRACE) == DEV_INVALID)
return -1;
return rte_eth_devices[port_id].pci_dev->numa_node;
 }
@@ -369,10 +415,8 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t 
rx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-

[dpdk-dev] [PATCH v8 02/14] eal_pci: pci memory map work with driver type

2015-02-16 Thread Tetsuya Mukawa

From: Michael Qiu 

With the driver type flag in struct rte_pci_dev, we do not need
to always  map uio devices with vfio related function when
vfio enabled.

Signed-off-by: Michael Qiu 
Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index e760452..3c463b2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -556,25 +556,29 @@ pci_config_space_set(struct rte_pci_device *dev)
 static int
 pci_map_device(struct rte_pci_device *dev)
 {
-   int ret, mapped = 0;
+   int ret = -1;

/* try mapping the NIC resources using VFIO if it exists */
+   switch (dev->pt_driver) {
+   case RTE_PT_VFIO:
 #ifdef VFIO_PRESENT
-   if (pci_vfio_is_enabled()) {
-   ret = pci_vfio_map_resource(dev);
-   if (ret == 0)
-   mapped = 1;
-   else if (ret < 0)
-   return ret;
-   }
+   if (pci_vfio_is_enabled())
+   ret = pci_vfio_map_resource(dev);
 #endif
-   /* map resources for devices that use igb_uio */
-   if (!mapped) {
+   break;
+   case RTE_PT_IGB_UIO:
+   case RTE_PT_UIO_GENERIC:
+   /* map resources for devices that use uio */
ret = pci_uio_map_resource(dev);
-   if (ret != 0)
-   return ret;
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "  Not managed by known pt driver,"
+   " skipped\n");
+   ret = 1;
+   break;
}
-   return 0;
+
+   return ret;
 }

 /*
-- 
1.9.1

[dpdk-dev] [PATCH v8 01/14] eal_pci: Add flag to hold kernel driver type

2015-02-16 Thread Tetsuya Mukawa

From: Michael Qiu 

Currently, dpdk has no ability to know which type of driver(
vfio-pci/igb_uio/uio_pci_generic) the device used. It only can
check whether vfio is enabled or not staticly.

It really useful to have the flag, becasue different type need to
handle differently in runtime. For example, pci memory map,
pot hotplug, and so on.

This patch add a flag field for pci device to solve above issue.

Signed-off-by: Michael Qiu 
Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  8 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 53 +++--
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..7b48b55 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -139,6 +139,13 @@ struct rte_pci_addr {

 struct rte_devargs;

+enum rte_pt_driver {
+   RTE_PT_UNKNOWN  = 0,
+   RTE_PT_IGB_UIO  = 1,
+   RTE_PT_VFIO = 2,
+   RTE_PT_UIO_GENERIC  = 3,
+};
+
 /**
  * A structure describing a PCI device.
  */
@@ -152,6 +159,7 @@ struct rte_pci_device {
uint16_t max_vfs;   /**< sriov enable if not zero */
int numa_node;  /**< NUMA node connection */
struct rte_devargs *devargs;/**< Device user arguments */
+   enum rte_pt_driver pt_driver;   /**< Driver of passthrough */
 };

 /** Any PCI device identifier (vendor, device, ...) */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 15db9c4..e760452 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,35 @@ error:
return -1;
 }

+static int
+pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
+{
+   int count;
+   char path[PATH_MAX];
+   char *name;
+
+   if (!filename || !dri_name)
+   return -1;
+
+   count = readlink(filename, path, PATH_MAX);
+   if (count >= PATH_MAX)
+   return -1;
+
+   /* For device does not have a driver */
+   if (count < 0)
+   return 1;
+
+   path[count] = '\0';
+
+   name = strrchr(path, '/');
+   if (name) {
+   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
+   return 0;
+   }
+
+   return -1;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -222,11 +251,12 @@ pci_scan_one(const char *dirname, uint16_t domain, 
uint8_t bus,
char filename[PATH_MAX];
unsigned long tmp;
struct rte_pci_device *dev;
+   char driver[PATH_MAX];
+   int ret;

dev = malloc(sizeof(*dev));
-   if (dev == NULL) {
+   if (dev == NULL)
return -1;
-   }

memset(dev, 0, sizeof(*dev));
dev->addr.domain = domain;
@@ -305,6 +335,25 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
return -1;
}

+   /* parse driver */
+   snprintf(filename, sizeof(filename), "%s/driver", dirname);
+   ret = pci_get_kernel_driver_by_path(filename, driver);
+   if (!ret) {
+   if (!strcmp(driver, "vfio-pci"))
+   dev->pt_driver = RTE_PT_VFIO;
+   else if (!strcmp(driver, "igb_uio"))
+   dev->pt_driver = RTE_PT_IGB_UIO;
+   else if (!strcmp(driver, "uio_pci_generic"))
+   dev->pt_driver = RTE_PT_UIO_GENERIC;
+   else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+   } else if (ret < 0) {
+   RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
+   free(dev);
+   return -1;
+   } else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+
/* device is valid, add in list (sorted) */
if (TAILQ_EMPTY(&pci_device_list)) {
TAILQ_INSERT_TAIL(&pci_device_list, dev, next);
-- 
1.9.1

[dpdk-dev] [PATCH v8 00/14] Port Hotplug Framework

2015-02-16 Thread Tetsuya Mukawa

This patch series adds a dynamic port hotplug framework to DPDK.
With the patches, DPDK apps can attach or detach ports at runtime.

The basic concept of the port hotplug is like followings.
- DPDK apps must have responsibility to manage ports.
  DPDK apps only know which ports are attached or detached at the moment.
  The port hotplug framework is implemented to allow DPDK apps to manage ports.
  For example, when DPDK apps call port attach function, attached port number
  will be returned. Also, DPDK apps can detach port by port number.
- Kernel support is needed for attaching or detaching physical device ports.
  To attach a new physical device port, the device will be recognized by
  userspace directly I/O framework in kernel at first. Then DPDK apps can
  call the port hotplug functions to attach ports.
  For detaching, steps are vice versa.
- Before detach ports, ports must be stopped and closed.
  DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
  detaching ports. These function will call finalization codes of PMDs.
  But so far, no PMD frees all resources allocated by initialization.
  It means PMDs are needed to be fixed to support the port hotplug.
  'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
  Without this flag, detaching will be failed.
- Mustn't affect legacy DPDK apps.
  No DPDK EAL behavior is changed, if the port hotplug functions are't called.
  So all legacy DPDK apps can still work without modifications.

And a few limitations.
- The port hotplug functions are not thread safe.
  DPDK apps should handle it.
- Only support Linux and igb_uio so far.
  BSD and VFIO is not supported. I will send VFIO patches at least, but I don't
  have a plan to submit BSD patch so far.


Here is port hotplug APIs.
---
/**
 * Attach a new device.
 *
 * @param devargs
 *   A pointer to a strings array describing the new device
 *   to be attached. The strings should be a pci address like
 *   ':01:00.0' or virtual device name like 'eth_pcap0'.
 * @param port_id
 *  A pointer to a port identifier actually attached.
 * @return
 *  0 on success and port_id is filled, negative on error
 */
int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

/**
 * Detach a device.
 *
 * @param port_id
 *   The port identifier of the device to detach.
 * @param addr
 *  A pointer to a device name actually detached.
 * @return
 *  0 on success and devname is filled, negative on error
 */
int rte_eal_dev_detach(uint8_t port_id, char *devname);
---

This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check
a patch for pcap PMD.

Also, please check testpmd patch. It will show you how to fix your legacy
applications to support port hotplug feature.

PATCH v8 changes
 - Fix Makefile and add version map file.
 - Add missing symbol in version map.
 - Fix pci_scan_one() to update sysfs values.
   (Thanks to Qiu, Michael and Iremonger, Bernard)
 - NONE_TRACE is replaced by NO_TRACE.
 - Fix typo.
 - Add size parameter to rte_eth_dev_save().
   (Thanks to Iremonger, Bernard)

PATCH v7 changes
 - Add a new section to programmer's guide.
   (Thanks to Iremonger, Bernard)
 - Fix port checking implementation of star_port().
 - Fix typo of warning messages.
 - Add pt_driver checking to rte_eth_dev_check_detachable().
   (Thanks to Qiu, Michael)

PATCH v6 changes
 - Fix rte_eth_dev_uninit() to handle a return value of uninit
   function of PMD. To do this, below changes also be applied.
   - Fix a parameter of rte_eth_dev_free().
   - Use rte_eth_dev structure as the paramter of rte_eth_dev_free().

PATCH v5 changes
 - Add runtime check passthrough driver type, like vfio-pci, igb_uio
   and uio_pci_generic.
   This was done by Qiu, Michael. Thanks a lot.
 - Change function names like below.
   - rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
   - rte_eal_dev_invoke() to rte_eal_vdev_invoke().
 - Add code to handle a return value of rte_eal_devargs_remove().
 - Fix pci address format in rte_eal_dev_detach().
 - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
 - Change function definition of rte_eal_devargs_remove().
 - Fix pci_unmap_device() to check pt_driver.
 - Fix return value of below functions.
   - rte_eth_dev_get_changed_port().
   - rte_eth_dev_get_port_by_addr().
 - Change paramters of rte_eth_dev_validate_port() to cleanup code.
 - Fix pci_scan_one to handle pt_driver correctly.
   (Thanks to Qiu, Michael for above suggestions)

PATCH v4 changes
 - Merge patches to review easier.
 - Fix indent of 'if' statement.
 - Fix calculation method of eal_compare_pci_addr().
 - Fix header file declaration.
 - Add header file to determine if hotplug can be enabled.
   (Thanks to Qiu, Michael)
 - Use

[dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to convert guide docs to pdf

2015-02-16 Thread Mcnamara, John

> -Original Message-
> From: Iremonger, Bernard
> Sent: Monday, February 16, 2015 12:20 PM
> To: Mcnamara, John; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to
> convert guide docs to pdf
> 
> >
> >  .PHONY: clean
> > -clean: api-html-clean guides-html-clean
> > +clean: api-html-clean guides-html-clean guides-latex-clean
> 
> Hi John,
> 
> Would it be clearer to have a  guides-pdf-clean target  instead of guides-
> latex-clean given that there is a doc-pdf target?

Hi Bernard,

Thomas made the same comment on an earlier patch so clearly it isn't obvious 
where the rule comes from:

http://dpdk.org/ml/archives/dev/2015-January/012099.html

The eason is that it is re-using the existing clean rule. My reply is here:

http://dpdk.org/ml/archives/dev/2015-February/012241.html

> Sphinx creates build/doc/html and build/doc/latex directories (not 
/build/doc/pdf) so this just reuses the existing guides-%-clean rule.

John

[dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to wildcard

2015-02-16 Thread Mcnamara, John

> -Original Message-
> From: Iremonger, Bernard
> Sent: Monday, February 16, 2015 12:09 PM
> To: Mcnamara, John; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to
> wildcard
> >
> > -.. |vmdq_dcb_example| image:: img/vmdq_dcb_example.svg
> > +.. |vmdq_dcb_example| image:: img/vmdq_dcb_example.*
> > --
> > 1.7.4.1
> 
> Hi John,
> 
> Not sure why this is  necessary.
> The rte.sdkdoc.mk  file  seems to be looking for *.svg files.

Hi Bernard,

This change was is in response to Thomas's comment on the V3 version of the 
patch.

http://dpdk.org/ml/archives/dev/2015-February/012249.html

Personally I preferred the V3 solution which left the RST sources untouched and 
fixed the SVG extensions in the Tex docs via sed.

This part of the patch has a lot of churn as a result.

John

[dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to convert guide docs to pdf

2015-02-16 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 1/5] mk: Add 'make doc-pdf' target to convert 
> guide docs to pdf
> 
> Added make system support for building PDF versions of the guides. Requires 
> Python Sphinx and
> TexLive Full.
> 
> Signed-off-by: John McNamara 
> ---
>  mk/rte.sdkdoc.mk |   43 ++-
>  1 files changed, 42 insertions(+), 1 deletions(-)
> 
> diff --git a/mk/rte.sdkdoc.mk b/mk/rte.sdkdoc.mk index dabc0d6..e09628f 100644
> --- a/mk/rte.sdkdoc.mk
> +++ b/mk/rte.sdkdoc.mk
> @@ -37,13 +37,24 @@ endif
>  endif
> 
>  RTE_SPHINX_BUILD = sphinx-build
> +RTE_PDFLATEX_VERBOSE := --interaction=nonstopmode
> +
>  ifndef V
>  RTE_SPHINX_VERBOSE := -q
> +RTE_PDFLATEX_VERBOSE := --interaction=batchmode RTE_INKSCAPE_VERBOSE :=
> +> /dev/null 2>&1
>  endif
>  ifeq '$V' '0'
>  RTE_SPHINX_VERBOSE := -q
> +RTE_PDFLATEX_VERBOSE := --interaction=batchmode RTE_INKSCAPE_VERBOSE :=
> +> /dev/null 2>&1
>  endif
> 
> +RTE_GUIDE_PDFS := $(filter %/, $(wildcard $(RTE_SDK)/doc/guides/*/))
> +RTE_GUIDE_PDFS :=
> +$(RTE_GUIDE_PDFS:$(RTE_SDK)/doc/guides%=$(RTE_OUTPUT)/doc/latex/guides%
> +) RTE_GUIDE_PDFS := $(RTE_GUIDE_PDFS:%/=%.pdf) RTE_DEFAULT_DPI ?= 300
> +
>  .PHONY: help
>  help:
>   @cat $(RTE_SDK)/doc/build-sdk-quick.txt
> @@ -53,7 +64,7 @@ help:
>  all: api-html guides-html
> 
>  .PHONY: clean
> -clean: api-html-clean guides-html-clean
> +clean: api-html-clean guides-html-clean guides-latex-clean

Hi John,

Would it be clearer to have a  guides-pdf-clean target  instead of 
guides-latex-clean given that there is a doc-pdf target?

Regards,

Bernard.



> 
>  .PHONY: api-html
>  api-html: api-html-clean
> @@ -83,3 +94,33 @@ guides-%:
>   @echo 'sphinx for guides...'
>   $(Q)$(RTE_SPHINX_BUILD) -b $* $(RTE_SPHINX_VERBOSE) \
>   -c $(RTE_SDK)/doc/guides $(RTE_SDK)/doc/guides 
> $(RTE_OUTPUT)/doc/$*/guides
> +
> +
> +pdf: $(RTE_GUIDE_PDFS)
> +
> +.SECONDEXPANSION:
> +# Use wildcard expansion to avoid * expansion issue with make 3.82.
> +$(RTE_OUTPUT)/doc/latex/guides/%.pdf: $$(wildcard 
> $(RTE_SDK)/doc/guides/%/*.rst)
> + @echo 'creating' $* 'pdf ...'
> +
> + @# Convert the svg files to png for pdflatex.
> + $(eval tmp_images = $(wildcard $(RTE_SDK)/doc/guides/$*/img/*.svg))
> + $(Q)for image in $(tmp_images:.svg=); do \
> + inkscape -d $(RTE_DEFAULT_DPI) -D -b ff \
> + -f $$image.svg -e $$image.png $(RTE_INKSCAPE_VERBOSE); \
> + done
> +
> + @# Generate the latex files.
> + $(Q)$(RTE_SPHINX_BUILD) -b latex $(RTE_SPHINX_VERBOSE) \
> + -c $(RTE_SDK)/doc/guides  $(RTE_SDK)/doc/guides/$* \
> + $(RTE_OUTPUT)/doc/latex/guides/$*
> +
> + @# Remove the generated png files.
> + $(Q)rm -f $(tmp_images:.svg=.png)
> +
> + @# Generate the pdf files.
> + $(Q)sed -i 's/LATEXOPTS =/LATEXOPTS = $(RTE_PDFLATEX_VERBOSE)/' \
> + $(RTE_OUTPUT)/doc/latex/guides/$*/Makefile
> + $(Q)make all-pdf -s -C $(RTE_OUTPUT)/doc/latex/guides/$*
> +
> + $(Q)mv $(RTE_OUTPUT)/doc/latex/guides/$*/dpdk_doc.pdf $@
> --
> 1.7.4.1

[dpdk-dev] [PATCH] mk: fix missing link of librte_vhost in shared, non-combined config

2015-02-16 Thread Thomas Monjalon

2015-02-16 12:01, Panu Matilainen:
> On 02/13/2015 03:18 PM, Thomas Monjalon wrote:
> > 2015-02-13 12:33, Panu Matilainen:
> >> On 02/13/2015 11:28 AM, Thomas Monjalon wrote:
> >>> 2015-02-13 09:27, Panu Matilainen:
>  On 02/12/2015 05:44 PM, Thomas Monjalon wrote:
> > A library is considered as a plugin if there is no public API and it
> > registers itself. That's the case of normal PMD.
> > But bonding and Xen have some library parts with public API.
> > It has been discussed and agreed for bonding but I'm not aware of the 
> > Xen case.
> 
>  Fair enough, thanks for the explanation.
> 
>  Just wondering about versioning of these things - currently all the PMDs
>  are versioned as well, which is slightly at odds with their expected
>  usage, dlopen()'ed items usually are not versioned because it makes the
>  files moving targets. But if a plugin can be an library too then it
>  clearly needs to be versioned as well.
> >>>
> >>> Not sure to understand your considerations.
> >>> Plugins must be versioned because there can be some incompatibilities
> >>> like mbuf rework.
> >>
> >> Plugins are version-dependent obviously, but the issue is somewhat
> >> different from library versioning. Plugins are generally consumers of
> >> the versioned ABIs, whereas libraries are the providers.
> >>
>  I'm just thinking of typical packaging where the unversioned *.so
>  symlinks are in a -devel subpackage and the versioned libraries are in
>  the main runtime package. Plugins should be loadable by a stable
>  unversioned name always, for libraries the linker handles it behind the
>  scenes. So in packaging these things, plugin *.so links need to be
>  handled differently (placed into the main package) from others. Not
>  rocket science to filter by 'pmd' in the name, but a new twist anyway
>  and easy to get wrong.
> 
>  One possibility to make it all more obvious might be having a separate
>  directory for plugins, the mixed case ccould be handled by symlinks.
> >>>
> >>> I think I don't understand which use case you are trying to solve.
> >>
> >> Its a usability/documentation issue more than a technical one. If plugin
> >> DSO's are versioned (like they currently are), then loading them via eg
> >> -d becomes cumbersome since you need to hunt down and provide the
> >> versioned name, eg "testpmd -d librte_pmd_pcap.so.1 [...]"
> >>
> >> Like said above, it can be worked around by leaving the unversioned
> >> symlinks in place for plugins in runtime (library) packages, but that
> >> sort of voids the point of versioning. One possibility would be
> >> introducing a per-version plugin directory that would be used as the
> >> default path for dlopen() unless an absolute path is used.
> >
> > It makes me think that instead of using a -d option per plugin, why not
> > adding a -D option to load all plugins from a directory?
> 
> Are you thinking of "-D " or just -D (to use a build-time 
> hardwired directory)?

I'm thinking of "-D ".
I understand you would like a "hardwired" default directory which would be
properly packaged by a distribution. Maybe that it could be a build-time
default to load all the plugins of a directory (without option). Then the
-d and -D options would overwrite the build-time default behaviour.

[dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to wildcard

2015-02-16 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Tuesday, February 3, 2015 2:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 5/5] doc: Convert image extensions to wildcard
> 
> Changed all image.svg and image.png extensions to image.* This allows Sphinx 
> to decide the
> appropriate image type from the available image options.
> 
> Signed-off-by: John McNamara 
> ---
>  doc/guides/prog_guide/env_abstraction_layer.rst|2 +-
>  .../prog_guide/i40e_ixgbe_igb_virt_func_drv.rst|8 ++--
>  .../intel_dpdk_xen_based_packet_switch_sol.rst |6 ++--
>  doc/guides/prog_guide/ivshmem_lib.rst  |2 +-
>  doc/guides/prog_guide/kernel_nic_interface.rst |8 ++--
>  .../libpcap_ring_based_poll_mode_drv.rst   |2 +-
>  .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |   14 
>  doc/guides/prog_guide/lpm6_lib.rst |2 +-
>  doc/guides/prog_guide/lpm_lib.rst  |2 +-
>  doc/guides/prog_guide/malloc_lib.rst   |2 +-
>  doc/guides/prog_guide/mbuf_lib.rst |4 +-
>  doc/guides/prog_guide/mempool_lib.rst  |6 ++--
>  doc/guides/prog_guide/multi_proc_support.rst   |2 +-
>  doc/guides/prog_guide/overview.rst |2 +-
>  doc/guides/prog_guide/packet_distrib_lib.rst   |4 +-
>  doc/guides/prog_guide/packet_framework.rst |   14 
>  .../poll_mode_drv_emulated_virtio_nic.rst  |6 ++--
>  .../poll_mode_drv_paravirtual_vmxnets_nic.rst  |6 ++--
>  doc/guides/prog_guide/qos_framework.rst|   36 
> ++--
>  doc/guides/prog_guide/ring_lib.rst |   28 
>  doc/guides/sample_app_ug/dist_app.rst  |4 +-
>  doc/guides/sample_app_ug/exception_path.rst|2 +-
>  doc/guides/sample_app_ug/intel_quickassist.rst |2 +-
>  doc/guides/sample_app_ug/kernel_nic_interface.rst  |4 +-
>  .../sample_app_ug/l2_forward_real_virtual.rst  |4 +-
>  .../sample_app_ug/l3_forward_access_ctrl.rst   |4 +-
>  doc/guides/sample_app_ug/load_balancer.rst |2 +-
>  doc/guides/sample_app_ug/multi_process.rst |8 ++--
>  doc/guides/sample_app_ug/qos_scheduler.rst |2 +-
>  doc/guides/sample_app_ug/quota_watermark.rst   |6 ++--
>  doc/guides/sample_app_ug/test_pipeline.rst |2 +-
>  doc/guides/sample_app_ug/vhost.rst |   10 +++---
>  doc/guides/sample_app_ug/vm_power_management.rst   |4 +-
>  doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst   |2 +-
>  34 files changed, 106 insertions(+), 106 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 231e266..45791b6 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -212,4 +212,4 @@ Memory zones can be reserved with specific start address 
> alignment by
> supplying  The alignment value should be a power of two and not less than the 
> cache line size (64
> bytes).
>  Memory zones can also be reserved from either 2 MB or 1 GB hugepages, 
> provided that both are
> available on the system.
> 
> -.. |linuxapp_launch| image:: img/linuxapp_launch.svg
> +.. |linuxapp_launch| image:: img/linuxapp_launch.*
> diff --git a/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst
> b/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst
> index 41e316e..a984379 100755
> --- a/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst
> +++ b/doc/guides/prog_guide/i40e_ixgbe_igb_virt_func_drv.rst
> @@ -542,10 +542,10 @@ which belongs to the destination VF on the VM.
> 
>  |inter_vm_comms|
> 
> -.. |perf_benchmark| image:: img/perf_benchmark.png
> +.. |perf_benchmark| image:: img/perf_benchmark.*
> 
> -.. |single_port_nic| image:: img/single_port_nic.png
> +.. |single_port_nic| image:: img/single_port_nic.*
> 
> -.. |inter_vm_comms| image:: img/inter_vm_comms.png
> +.. |inter_vm_comms| image:: img/inter_vm_comms.*
> 
> -.. |fast_pkt_proc| image:: img/fast_pkt_proc.png
> +.. |fast_pkt_proc| image:: img/fast_pkt_proc.*
> diff --git a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> index 1f1e04f..47841cd 100644
> --- a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> +++ b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> @@ -457,8 +457,8 @@ The packet flow is:
> 
>  packet generator->Virtio in guest VM1->switching backend->Virtio in guest 
> VM2->switching backend-
> >wire
> 
> -.. |grant_table| image:: img/grant_table.png
> +.. |grant_table| image:: img/grant_table.*
> 
> -.. |grant_refs| image:: img/grant_refs.png
> +.. |grant_refs| image:: img/grant_refs.*
> 
> -.. |dpdk_xen_pkt_switch| image:: img/dpdk_xen_pk

[dpdk-dev] [PATCH] mk: fix missing link of librte_vhost in shared, non-combined config

2015-02-16 Thread Panu Matilainen

On 02/13/2015 03:18 PM, Thomas Monjalon wrote:
> 2015-02-13 12:33, Panu Matilainen:
>> On 02/13/2015 11:28 AM, Thomas Monjalon wrote:
>>> 2015-02-13 09:27, Panu Matilainen:
 On 02/12/2015 05:44 PM, Thomas Monjalon wrote:
> A library is considered as a plugin if there is no public API and it
> registers itself. That's the case of normal PMD.
> But bonding and Xen have some library parts with public API.
> It has been discussed and agreed for bonding but I'm not aware of the Xen 
> case.

 Fair enough, thanks for the explanation.

 Just wondering about versioning of these things - currently all the PMDs
 are versioned as well, which is slightly at odds with their expected
 usage, dlopen()'ed items usually are not versioned because it makes the
 files moving targets. But if a plugin can be an library too then it
 clearly needs to be versioned as well.
>>>
>>> Not sure to understand your considerations.
>>> Plugins must be versioned because there can be some incompatibilities
>>> like mbuf rework.
>>
>> Plugins are version-dependent obviously, but the issue is somewhat
>> different from library versioning. Plugins are generally consumers of
>> the versioned ABIs, whereas libraries are the providers.
>>
 I'm just thinking of typical packaging where the unversioned *.so
 symlinks are in a -devel subpackage and the versioned libraries are in
 the main runtime package. Plugins should be loadable by a stable
 unversioned name always, for libraries the linker handles it behind the
 scenes. So in packaging these things, plugin *.so links need to be
 handled differently (placed into the main package) from others. Not
 rocket science to filter by 'pmd' in the name, but a new twist anyway
 and easy to get wrong.

 One possibility to make it all more obvious might be having a separate
 directory for plugins, the mixed case ccould be handled by symlinks.
>>>
>>> I think I don't understand which use case you are trying to solve.
>>
>> Its a usability/documentation issue more than a technical one. If plugin
>> DSO's are versioned (like they currently are), then loading them via eg
>> -d becomes cumbersome since you need to hunt down and provide the
>> versioned name, eg "testpmd -d librte_pmd_pcap.so.1 [...]"
>>
>> Like said above, it can be worked around by leaving the unversioned
>> symlinks in place for plugins in runtime (library) packages, but that
>> sort of voids the point of versioning. One possibility would be
>> introducing a per-version plugin directory that would be used as the
>> default path for dlopen() unless an absolute path is used.
>
> It makes me think that instead of using a -d option per plugin, why not
> adding a -D option to load all plugins from a directory?

Are you thinking of "-D " or just -D (to use a build-time 
hardwired directory)?

- Panu -

[dpdk-dev] Explanation of the QoS offset values used in the QoS scheduler example app.

2015-02-16 Thread Dumitrescu, Cristian

Hi,

These are byte offsets used for reading these packet fields, considering that 
packet bytes are stored in memory in network order, while the CPU is little 
endian, so byte swapping takes place on read.

This is probably not the best way to write this code, and I agree this portion 
of the app code is a bit more cryptic than it should be. Using data structures 
to describe the header format for the input packet (Ethernet/SVLAN/CVLAN/IPv4) 
and using portable byte swapping macros is probably a better alternative.

This being said, the code implementation, code comments and Sample App Guide 
description seem to be consistent and correct.

Regards,
Cristian


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Aws Ismail
Sent: Saturday, February 14, 2015 7:35 PM
To: dev at dpdk.org
Subject: [dpdk-dev] Explanation of the QoS offset values used in the QoS 
scheduler example app.

Hi everyone,

I am looking at this portion of the code in the app_thread.c file of the
QoS scheduler example application:

/*
* QoS parameters are encoded as follows:
* Outer VLAN ID defines subport
* Inner VLAN ID defines pipe
* Destination IP 0.0.XXX.0 defines traffic class
* Destination IP host (0.0.0.XXX) defines queue
* Values below define offset to each field from start of frame
*/
#define SUBPORT_OFFSET 7
#define PIPE_OFFSET 9
#define TC_OFFSET 20
#define QUEUE_OFFSET 20
#define COLOR_OFFSET 19

static inline int get_pkt_sched(struct rte_mbuf *m, uint32_t *subport,
uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue, uint32_t *color)
{
uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *);

*subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) &
(port_params.n_subports_per_port - 1); /* Outer VLAN ID*/

*pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) &
(port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */

*traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) &
(RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */

*queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) &
(RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */

*color = pdata[COLOR_OFFSET] & 0x03; /* Destination IP */

return 0;
}

The offset values do not make sense to me. According to the programmer
guide, the queue selection is SVID/CVID/TC/QID based. And those offset seem
off in this case. Is this because it is assuming that the packet is being
altered before it gets to this stage ?

Can anyone provide a better explanation or at least the reason behind
choosing those offset values shown above.

Thanks.
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH 3/6] bsd: remove useless assignments

2015-02-16 Thread Bruce Richardson

On Sat, Feb 14, 2015 at 09:59:07AM -0500, Stephen Hemminger wrote:
> If variable is set in the next line, it doesn't need to be
> initialized.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_eal/bsdapp/eal/eal.c | 3 ++-
>  lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 69f3c03..71ae33c 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -417,7 +417,8 @@ int rte_eal_has_hugepages(void)
>  int
>  rte_eal_iopl_init(void)
>  {
> - int fd = -1;
> + int fd;
> +
>   fd = open("/dev/io", O_RDWR);

Why not just merge the two lines and make it "int fd = open(...);". 

/Bruce

[dpdk-dev] ACL lookup doesn't work for some schemes

2015-02-16 Thread Stevan Markovic

Hi,

On Mon, Feb 16, 2015 at 4:56 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> Yes, right now, libtre_acl to work correctly first field has to be 1B long
> and all subsequent grouped into sets of 4 consecutive bytes.
> I thought we have it documented into our PG, ACL section:
> http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
> Though re-reading it again:
> "For performance reasons, the inner loop of the search function is
> unrolled to process four input bytes at a time. This requires the input to
> be grouped into sets of 4 consecutive bytes. The loop processes the first
> input byte as part of the setup and then subsequent bytes must be in groups
> of 4 consecutive bytes."
> It probably not very clear and need to be explained in more details.
> Will update the doc.
>
> Konstantin
>
>
While improving API documentation would be great, enforcing these
constraints on user defined fields in rte_acl_build(?) also (return an
error if constraints are not met) would be even better.

Stevan

[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-16 Thread Jay Rolette

On Tue, Feb 10, 2015 at 7:33 PM, Jay Rolette  wrote:

> Environment:
>   * DPDK 1.6.0r2
>   * Ubuntu 14.04 LTS
>   * kernel: 3.13.0-38-generic
>
> When we start exercising KNI a fair bit (transferring files across it,
> both sending and receiving), I'm starting to see a fair bit of these kernel
> lockups:
>
> kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
>
> Frequently I can't do much other than get a screenshot of the error
> message coming across the console session once we get into this state, so
> debugging what is happening is "interesting"...
>
> I've seen this on multiple hardware platforms (so not box specific) as
> well as virtual machines.
>
> Are there any known issues with KNI that would cause kernel lockups in
> DPDK 1.6? Really hoping someone that knows KNI well can point me in the
> right direction.
>
> KNI in the 1.8 tree is significantly different, so it didn't look
> straight-forward to back-port it, although I do see a few changes that
> might be relevant.
>

Found the problem. No patch to submit since it's already fixed in later
versions of DPDK, but thought I'd follow up with the details since I'm sure
we aren't the only ones trying to use bleeding-edge versions of DPDK...

In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
netif_rx(). The source for netif_receive_skb() point out that it should
only be called from soft-irq context, which isn't the case for KNI.

As typical, simple fix once you track it down.

Yao-Po Wang's fix:  commit 41a6ebded53982107c1adfc0652d6cc1375a7db9.

Cheers,
Jay

[dpdk-dev] Query on portmask config

2015-02-16 Thread De Lara Guarch, Pablo

Hi Shankari,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shankari
> Vaidyalingam
> Sent: Sunday, February 15, 2015 2:46 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Query on portmask config
> 
> Hi,
> 
> I'm new to DPDK.
> Would like to know how to determine the portmask for a given
> configuration.
> Does it depend on the number of cores configured

Usually portmask is parsed with the -p option (depends on the app), followed by 
an hexadecimal value.
So, for instance, -p 0x5 will use ports 0 and 2 from the ports bound to one of 
the DPDK drivers.

Regarding to your second question, number of ports are independent of number of 
cores.
A single core can handle several ports, and even several cores can handle a 
single port (using different queues).

Regards,
Pablo
> 
> Regards
> Shankari.V

[dpdk-dev] ACL lookup doesn't work for some schemes

2015-02-16 Thread Ananyev, Konstantin

Hi

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ???
> Sent: Sunday, February 15, 2015 9:19 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] ACL lookup doesn't work for some schemes
> 
> Hi,I noticed that ACL lookup doesn't work for some schemes.1. If the 
> first field is not uint8_t, even all fields are wildcard, lookup
> doesn't find the matching rule. See acl_8last.c.2. I prepended a uint8_t 
> field, keep other fields be wildcard, lookup returns the correct
> result. See acl_8last2.c3. Then I change last field from 8bitmask_WILDCARD to 
> 8bitmask(1, 0x1) (matches odd numbers) or 8bitmask(0,
> 0x1) (match even numbers), lookup doesn't return the correct. See 
> acl_8last3.c.  And I noticed the similar behavior for uint16_t
> ranges(date doesn't match 0-0x8000 nor 0x8001-0x).Above behaviors are 
> tricky. Does ACL do some undocumented assumptions
> or the table schema?Regards,Zhichang Yu

Yes, right now, libtre_acl to work correctly first field has to be 1B long and 
all subsequent grouped into sets of 4 consecutive bytes.
I thought we have it documented into our PG, ACL section:
http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
Though re-reading it again:
"For performance reasons, the inner loop of the search function is unrolled to 
process four input bytes at a time. This requires the input to be grouped into 
sets of 4 consecutive bytes. The loop processes the first input byte as part of 
the setup and then subsequent bytes must be in groups of 4 consecutive bytes."
It probably not very clear and need to be explained in more details.
Will update the doc.

Konstantin

[dpdk-dev] Explanation of the QoS offset values used in the QoS scheduler example app.

2015-02-16 Thread Aws Ismail

Thanks Cristian and Ariel your reply and explanation.

It is clear to me now.

Cheers.

Aws\
On Feb 16, 2015 6:34 AM, "Dumitrescu, Cristian" <
cristian.dumitrescu at intel.com> wrote:

> Hi,
>
> These are byte offsets used for reading these packet fields, considering
> that packet bytes are stored in memory in network order, while the CPU is
> little endian, so byte swapping takes place on read.
>
> This is probably not the best way to write this code, and I agree this
> portion of the app code is a bit more cryptic than it should be. Using data
> structures to describe the header format for the input packet
> (Ethernet/SVLAN/CVLAN/IPv4) and using portable byte swapping macros is
> probably a better alternative.
>
> This being said, the code implementation, code comments and Sample App
> Guide description seem to be consistent and correct.
>
> Regards,
> Cristian
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Aws Ismail
> Sent: Saturday, February 14, 2015 7:35 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Explanation of the QoS offset values used in the QoS
> scheduler example app.
>
> Hi everyone,
>
> I am looking at this portion of the code in the app_thread.c file of the
> QoS scheduler example application:
>
> /*
> * QoS parameters are encoded as follows:
> * Outer VLAN ID defines subport
> * Inner VLAN ID defines pipe
> * Destination IP 0.0.XXX.0 defines traffic class
> * Destination IP host (0.0.0.XXX) defines queue
> * Values below define offset to each field from start of frame
> */
> #define SUBPORT_OFFSET 7
> #define PIPE_OFFSET 9
> #define TC_OFFSET 20
> #define QUEUE_OFFSET 20
> #define COLOR_OFFSET 19
>
> static inline int get_pkt_sched(struct rte_mbuf *m, uint32_t *subport,
> uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue, uint32_t *color)
> {
> uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *);
>
> *subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) &
> (port_params.n_subports_per_port - 1); /* Outer VLAN ID*/
>
> *pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) &
> (port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */
>
> *traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) &
> (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */
>
> *queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) &
> (RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */
>
> *color = pdata[COLOR_OFFSET] & 0x03; /* Destination IP */
>
> return 0;
> }
>
> The offset values do not make sense to me. According to the programmer
> guide, the queue selection is SVID/CVID/TC/QID based. And those offset seem
> off in this case. Is this because it is assuming that the packet is being
> altered before it gets to this stage ?
>
> Can anyone provide a better explanation or at least the reason behind
> choosing those offset values shown above.
>
> Thanks.
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient, please
> contact the sender and delete all copies.
>
>

[dpdk-dev] [PATCH v8 08/14] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-02-16 Thread Neil Horman

On Mon, Feb 16, 2015 at 01:14:27PM +0900, Tetsuya Mukawa wrote:
> The patch adds functions for unmapping igb_uio resources. The patch is only
> for Linux and igb_uio environment. VFIO and BSD are not supported.
> 
> v8:
> - Fix typo.
>   (Thanks to Iremonger, Bernard)
> v5:
> - Fix pci_unmap_device() to check pt_driver.
> v4:
> - Add parameter checking.
> - Add header file to determine if hotplug can be enabled.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/Makefile  |  1 +
>  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 44 +
>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
> +
>  5 files changed, 162 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h
> 
> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
> index 52c1a5f..db7cc93 100644
> --- a/lib/librte_eal/common/Makefile
> +++ b/lib/librte_eal/common/Makefile
> @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
>  INC += rte_hexdump.h rte_devargs.h rte_dev.h
>  INC += rte_common_vect.h
>  INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
> +INC += rte_dev_hotplug.h
>  
>  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
>  INC += rte_warnings.h
> diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
> b/lib/librte_eal/common/include/rte_dev_hotplug.h
> new file mode 100644
> index 000..b333e0f
> --- /dev/null
> +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
> @@ -0,0 +1,44 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 IGEL Co.,LTd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of IGEL Co.,Ltd. nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_DEV_HOTPLUG_H_
> +#define _RTE_DEV_HOTPLUG_H_
> +
> +/*
> + * determine if hotplug can be enabled on the system
> + */
> +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
> +#define ENABLE_HOTPLUG
> +#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
I'm not sure why you're doing this.  Why not just test RTE_LIBRTE_EAL_HOTPLUG in
the various locations where your testing ENABLE_HOTPLUG?  This seems like
indirection for the sake of indirection.
Neil

[dpdk-dev] [PATCH v8 07/14] ethdev: Add functions that will be used by port hotplug functions

2015-02-16 Thread Neil Horman

On Mon, Feb 16, 2015 at 01:14:26PM +0900, Tetsuya Mukawa wrote:
> The patch adds following functions.
> 
> - rte_eth_dev_save()
>   The function is used for saving current rte_eth_dev structures.
> - rte_eth_dev_get_changed_port()
>   The function receives the rte_eth_dev structures, then compare
>   these with current values to know which port is actually
>   attached or detached.
> - rte_eth_dev_get_addr_by_port()
>   The function returns a pci address of an ethdev specified by port
>   identifier.
> - rte_eth_dev_get_port_by_addr()
>   The function returns a port identifier of an ethdev specified by
>   pci address.
> - rte_eth_dev_get_name_by_port()
>   The function returns a unique identifier name of an ethdev
>   specified by port identifier.
> - Add rte_eth_dev_check_detachable()
>   The function returns whether a PMD supports detach function.
> 
> Also, the patch changes scope of rte_eth_dev_allocated() to global.
> This function will be called by virtual PMDs to support port hotplug.
> So change scope of the function to global.
> 
> v8:
> - Add size parameter to rte_eth_dev_save().
> - Add missing symbol in version map.
>   (Thanks to Qiu, Michael and Iremonger, Bernard)
> v7:
> - Add pt_driver checking to rte_eth_dev_check_detachable().
>   (Thanks to Qiu, Michael)
> v5:
> - Fix return value of below functions.
>   rte_eth_dev_get_changed_port().
>   rte_eth_dev_get_port_by_addr().
> v4:
> - Add parameter checking.
> v3:
> - Fix if-condition bug while comparing pci addresses.
> - Add error checking codes.
> Reported-by: Mark Enright 
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_ether/rte_ethdev.c  | 99 
> +-
>  lib/librte_ether/rte_ethdev.h  | 83 
>  lib/librte_ether/rte_ether_version.map |  6 +++
>  3 files changed, 187 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 58d8072..3869a96 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
>   RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
>  }
>  
> -static struct rte_eth_dev *
> +struct rte_eth_dev *
>  rte_eth_dev_allocated(const char *name)
>  {
>   unsigned i;
> @@ -426,6 +426,103 @@ rte_eth_dev_count(void)
>   return (nb_ports);
>  }
>  
> +int
> +rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
> +{
> + if ((devs == NULL) ||
> + (size != sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS))
> + return -EINVAL;
> +
> + /* save current rte_eth_devices */
> + memcpy(devs, rte_eth_devices, size);
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
> +{
> + if ((devs == NULL) || (port_id == NULL))
> + return -EINVAL;
> +
> + /* check which port was attached or detached */
> + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) {
> + if (rte_eth_devices[*port_id].attached ^ devs->attached)
> + return 0;
> + }
> + return -ENODEV;
> +}
> +
> +int
> +rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
> +{
> + if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
> + return -EINVAL;
> +
> + if (addr == NULL) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -EINVAL;
> + }
> +
> + *addr = rte_eth_devices[port_id].pci_dev->addr;
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
> +{
> + struct rte_pci_addr *tmp;
> +
> + if ((addr == NULL) || (port_id == NULL)) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -EINVAL;
> + }
> +
> + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
> + if (!rte_eth_devices[*port_id].attached)
> + continue;
> + if (!rte_eth_devices[*port_id].pci_dev)
> + continue;
> + tmp = &rte_eth_devices[*port_id].pci_dev->addr;
> + if (eal_compare_pci_addr(tmp, addr) == 0)
> + return 0;
> + }
> + return -ENODEV;
> +}
> +
> +int
> +rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
> +{
> + char *tmp;
> +
> + if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
> + return -EINVAL;
> +
> + if (name == NULL) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -EINVAL;
> + }
> +
> + /* shouldn't check 'rte_eth_devices[i].data',
> +  * because it might be overwritten by VDEV PMD */
> + tmp = rte_eth_dev_data[port_id].name;
> + strncpy(name, tmp, strlen(tmp) + 1);
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_check_detachable(uint8_t port_id)
> +{
> + uint32_t drv_f

[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-16 Thread Jay Rolette

Thanks Alejandro.

I'll look into the kernel dump if there is one. The system is extremely
brittle once this happens. Usually I can't do much other than power-cycle
the box. Anything requiring sudo just locks the terminal up, so little to
look at besides the messages on the console.

Matthew Hall also suggested a few things for me to look into, so I'm
following up on that as well.

Jay

On Wed, Feb 11, 2015 at 10:25 AM, Alejandro Lucero <
alejandro.lucero at netronome.com> wrote:

> Hi Jay,
>
> I saw these errors when I worked in the HPC sector. They come usually with
> a kernel dump for each core in the machine so you can know, after some
> peering at the kernel code, how the soft lockup triggers. When I did that
> it was always an issue with the memory.
>
> So those times that you can still work on the machine after the problem,
> look at the kernel messages. I will be glad to look at it.
>
>
>
> On Wed, Feb 11, 2015 at 1:33 AM, Jay Rolette 
> wrote:
>
> > Environment:
> >   * DPDK 1.6.0r2
> >   * Ubuntu 14.04 LTS
> >   * kernel: 3.13.0-38-generic
> >
> > When we start exercising KNI a fair bit (transferring files across it,
> both
> > sending and receiving), I'm starting to see a fair bit of these kernel
> > lockups:
> >
> > kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
> >
> > Frequently I can't do much other than get a screenshot of the error
> message
> > coming across the console session once we get into this state, so
> debugging
> > what is happening is "interesting"...
> >
> > I've seen this on multiple hardware platforms (so not box specific) as
> well
> > as virtual machines.
> >
> > Are there any known issues with KNI that would cause kernel lockups in
> DPDK
> > 1.6? Really hoping someone that knows KNI well can point me in the right
> > direction.
> >
> > KNI in the 1.8 tree is significantly different, so it didn't look
> > straight-forward to back-port it, although I do see a few changes that
> > might be relevant.
> >
> > Any suggestions, pointers or other general help for tracking this down?
> >
> > Thanks!
> > Jay
> >
>

[dpdk-dev] [PATCH v8 00/14] Port Hotplug Framework

2015-02-16 Thread Qiu, Michael

On 2/16/2015 12:15 PM, Tetsuya Mukawa wrote:
> This patch series adds a dynamic port hotplug framework to DPDK.
> With the patches, DPDK apps can attach or detach ports at runtime.
>
> The basic concept of the port hotplug is like followings.
> - DPDK apps must have responsibility to manage ports.
>   DPDK apps only know which ports are attached or detached at the moment.
>   The port hotplug framework is implemented to allow DPDK apps to manage 
> ports.
>   For example, when DPDK apps call port attach function, attached port number
>   will be returned. Also, DPDK apps can detach port by port number.
> - Kernel support is needed for attaching or detaching physical device ports.
>   To attach a new physical device port, the device will be recognized by
>   userspace directly I/O framework in kernel at first. Then DPDK apps can
>   call the port hotplug functions to attach ports.
>   For detaching, steps are vice versa.
> - Before detach ports, ports must be stopped and closed.
>   DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
>   detaching ports. These function will call finalization codes of PMDs.
>   But so far, no PMD frees all resources allocated by initialization.
>   It means PMDs are needed to be fixed to support the port hotplug.
>   'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
>   Without this flag, detaching will be failed.
> - Mustn't affect legacy DPDK apps.
>   No DPDK EAL behavior is changed, if the port hotplug functions are't called.
>   So all legacy DPDK apps can still work without modifications.
>
> And a few limitations.
> - The port hotplug functions are not thread safe.
>   DPDK apps should handle it.
> - Only support Linux and igb_uio so far.
>   BSD and VFIO is not supported. I will send VFIO patches at least, but I 
> don't
>   have a plan to submit BSD patch so far.
>
>
> Here is port hotplug APIs.
> ---
> /**
>  * Attach a new device.
>  *
>  * @param devargs
>  *   A pointer to a strings array describing the new device
>  *   to be attached. The strings should be a pci address like
>  *   ':01:00.0' or virtual device name like 'eth_pcap0'.
>  * @param port_id
>  *  A pointer to a port identifier actually attached.
>  * @return
>  *  0 on success and port_id is filled, negative on error
>  */
> int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);
>
> /**
>  * Detach a device.
>  *
>  * @param port_id
>  *   The port identifier of the device to detach.
>  * @param addr
>  *  A pointer to a device name actually detached.
>  * @return
>  *  0 on success and devname is filled, negative on error
>  */
> int rte_eal_dev_detach(uint8_t port_id, char *devname);
> ---
>
> This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
> each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please 
> check
> a patch for pcap PMD.
>
> Also, please check testpmd patch. It will show you how to fix your legacy
> applications to support port hotplug feature.
>
> PATCH v8 changes
>  - Fix Makefile and add version map file.
>  - Add missing symbol in version map.
>  - Fix pci_scan_one() to update sysfs values.
>(Thanks to Qiu, Michael and Iremonger, Bernard)
>  - NONE_TRACE is replaced by NO_TRACE.
>  - Fix typo.
>  - Add size parameter to rte_eth_dev_save().
>(Thanks to Iremonger, Bernard)
>
> PATCH v7 changes
>  - Add a new section to programmer's guide.
>(Thanks to Iremonger, Bernard)
>  - Fix port checking implementation of star_port().
>  - Fix typo of warning messages.
>  - Add pt_driver checking to rte_eth_dev_check_detachable().
>(Thanks to Qiu, Michael)
>
> PATCH v6 changes
>  - Fix rte_eth_dev_uninit() to handle a return value of uninit
>function of PMD. To do this, below changes also be applied.
>- Fix a parameter of rte_eth_dev_free().
>- Use rte_eth_dev structure as the paramter of rte_eth_dev_free().
>
> PATCH v5 changes
>  - Add runtime check passthrough driver type, like vfio-pci, igb_uio
>and uio_pci_generic.
>This was done by Qiu, Michael. Thanks a lot.
>  - Change function names like below.
>- rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
>- rte_eal_dev_invoke() to rte_eal_vdev_invoke().
>  - Add code to handle a return value of rte_eal_devargs_remove().
>  - Fix pci address format in rte_eal_dev_detach().
>  - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
>  - Change function definition of rte_eal_devargs_remove().
>  - Fix pci_unmap_device() to check pt_driver.
>  - Fix return value of below functions.
>- rte_eth_dev_get_changed_port().
>- rte_eth_dev_get_port_by_addr().
>  - Change paramters of rte_eth_dev_validate_port() to cleanup code.
>  - Fix pci_scan_one to handle pt_driver correctly.
>(Thanks to Qiu, Michael for above suggestions)

[dpdk-dev] Intel DPDK support for ntop DPI

2015-02-16 Thread harshavardhan Reddy

Hi All,

Is ntop DPI integration available for Intel DPDK..?

I could see only Propretory qosmos ixEngine integrated with DPDK and
Windriver with its own DPI.

But not found any info about nDPI integration with DPDK.


Thank You,

Regards,
HVR

[dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of SOCKET_ID_ANY

2015-02-16 Thread Liang, Cunming

Hi,

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Sunday, February 15, 2015 10:09 PM
> To: Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of 
> SOCKET_ID_ANY
> 
> On Sun, Feb 15, 2015 at 12:43:03AM +, Liang, Cunming wrote:
> > Hi,
> >
> > > -Original Message-
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Saturday, February 14, 2015 1:57 AM
> > > To: Liang, Cunming
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v6 12/19] malloc: fix the issue of
> SOCKET_ID_ANY
> > >
> > > On Fri, Feb 13, 2015 at 09:38:14AM +0800, Cunming Liang wrote:
> > > > Add check for rte_socket_id(), avoid get unexpected return like (-1).
> > > >
> > > > Signed-off-by: Cunming Liang 
> > > > ---
> > > >  lib/librte_malloc/malloc_heap.h | 7 ++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/librte_malloc/malloc_heap.h
> b/lib/librte_malloc/malloc_heap.h
> > > > index b4aec45..a47136d 100644
> > > > --- a/lib/librte_malloc/malloc_heap.h
> > > > +++ b/lib/librte_malloc/malloc_heap.h
> > > > @@ -44,7 +44,12 @@ extern "C" {
> > > >  static inline unsigned
> > > >  malloc_get_numa_socket(void)
> > > >  {
> > > > -   return rte_socket_id();
> > > > +   unsigned socket_id = rte_socket_id();
> > > > +
> > > > +   if (socket_id == (unsigned)SOCKET_ID_ANY)
> > > > +   return 0;
> > > > +
> > > > +   return socket_id;
> > > Why is -1 unexpected?  Isn't it reasonable to assume that some memory is
> > > equidistant from all cpu numa nodes?
> > [LCM] One piece of memory will be whole allocated from one specific NUMA
> node. But won't be like some part from one and the other part from another.
> > If no specific NUMA node assigned(SOCKET_ID_ANY/-1), it firstly asks for the
> current NUMA node where current core belongs to.
> > 'malloc_get_numa_socket()' is called on that time. When the time 1:1
> thread/core mapping is assumed and the default value is 0, it always will 
> return a
> none (-1) value.
> > Now rte_socket_id() may return -1 in the case the pthread runs on 
> > multi-cores
> which are not belongs to one NUMA node, or in the case _socket_id is not yet
> assigned and the default value is (-1). So if current _socket_id is -1, then 
> just pick
> up the first node as the candidate. Probably I shall add more comments for 
> this.
> > >
> Ok, but doesn't that provide an abnormal bias for node 0?  I was thinking it
> might be better to be honest with the application so that it can choose a node
> according to its own policy.
[LCM] Personally I like the idea grant application to make the decision.
Either add a simple default configure or defines the more flexible policy of 
SOCKET_ID_ANY like
1) use the assigned default socket_id; 2) use current socket_id, if fail goto 
1);
3) (weight)round robin across the malloc_heaps; 4) use current socket_id, if 
fail goto 3); and etc.
But on another side, the well-tuned application are usually NUMA friendly. 
Instead of using SOCKET_ID_ANY, it most often assigned the expected socket_id.
Except getting the real current valid socket_id, The policy won't help on the 
affinity but mainly helps on the memory utilization.
I guess the worry comes from the case, after lots of memory allocation happens 
on socket 0, a new memzone_reserve fails when it definitely has to do it on 
socket 0 as well.
In this case, either changes the default NUMA node or balance the allocation 
won't solve the problem, but respite it happening.
It's because the explicit assignment allocation (memzone_reserve, malloc with a 
specified socket_id) may not average balanced.
In reverse, if reserving all necessary memzone first, even malloc fails on 
default socket, it will try to get allocation from other NUMA node.
I think it's out of the scope of this patch series. On current moment, using 
the simplest way taking node 0 as default socket_id is not bad.
For more, we can post on separate patch and involved more on the discussion.
Thanks.

> 
> Neil
> 
> > > Neil
> > >
> > > >  }
> > > >
> > > >  void *
> > > > --
> > > > 1.8.1.4
> > > >
> > > >
> >

92 matches

Mail list logo