[dpdk-dev] [PATCH v2 1/4] bus/vmbus: add hyper-v virtual bus support

2018-04-16 Thread Stephen Hemminger
From: Stephen Hemminger 

This patch adds support for an additional bus type Virtual Machine BUS
(VMBUS) on Microsoft Hyper-V in Windows 10, Windows Server 2016
and Azure. Most of this code was extracted from FreeBSD and some of
this is from earlier code donated by Brocade.

Only Linux is supported at present, but the code is split
to allow future FreeBSD and Windows support.

The bus support relies on the uio_hv_generic driver from Linux
kernel 4.16. Multiple queue support requires additional sysfs
interfaces which is in kernel 5.0 (a.k.a 4.17).

Signed-off-by: Stephen Hemminger 
---
 MAINTAINERS |   3 +
 config/common_base  |   5 +
 config/common_linuxapp  |   4 +
 drivers/bus/Makefile|   1 +
 drivers/bus/vmbus/Makefile  |  36 ++
 drivers/bus/vmbus/linux/Makefile|   3 +
 drivers/bus/vmbus/linux/vmbus_bus.c | 354 +
 drivers/bus/vmbus/linux/vmbus_uio.c | 379 ++
 drivers/bus/vmbus/private.h | 132 +++
 drivers/bus/vmbus/rte_bus_vmbus.h   | 400 +++
 drivers/bus/vmbus/rte_bus_vmbus_version.map |  28 ++
 drivers/bus/vmbus/rte_vmbus_reg.h   | 344 +
 drivers/bus/vmbus/vmbus_bufring.c   | 241 
 drivers/bus/vmbus/vmbus_channel.c   | 405 
 drivers/bus/vmbus/vmbus_common.c| 286 ++
 drivers/bus/vmbus/vmbus_common_uio.c| 232 +++
 mk/rte.app.mk   |   1 +
 17 files changed, 2854 insertions(+)
 create mode 100644 drivers/bus/vmbus/Makefile
 create mode 100644 drivers/bus/vmbus/linux/Makefile
 create mode 100644 drivers/bus/vmbus/linux/vmbus_bus.c
 create mode 100644 drivers/bus/vmbus/linux/vmbus_uio.c
 create mode 100644 drivers/bus/vmbus/private.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus_version.map
 create mode 100644 drivers/bus/vmbus/rte_vmbus_reg.h
 create mode 100644 drivers/bus/vmbus/vmbus_bufring.c
 create mode 100644 drivers/bus/vmbus/vmbus_channel.c
 create mode 100644 drivers/bus/vmbus/vmbus_common.c
 create mode 100644 drivers/bus/vmbus/vmbus_common_uio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f43e3fec4221..09d7f0e04618 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -381,6 +381,9 @@ VDEV bus driver
 M: Jianfeng Tan 
 F: drivers/bus/vdev/
 
+VMBUS bus driver
+M: Stephen Hemminger 
+F: drivers/bus/vmbus/
 
 Networking Drivers
 --
diff --git a/config/common_base b/config/common_base
index c2b0d91e0a4c..695034db661b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -397,6 +397,11 @@ CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
 CONFIG_RTE_LIBRTE_MVPP2_PMD=n
 
 #
+# Compile support for VMBus library
+#
+CONFIG_RTE_LIBRTE_VMBUS=n
+
+
 # Compile virtual device driver for NetVSC on Hyper-V/Azure
 #
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d0437e5d6aeb..30f24d0362c5 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -37,3 +37,7 @@ CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL=y
 CONFIG_RTE_LIBRTE_DPAA2_PMD=y
 CONFIG_RTE_LIBRTE_PMD_DPAA2_EVENTDEV=y
 CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC=y
+
+# Hyper-V Virtual Machine bus and drivers
+CONFIG_RTE_LIBRTE_VMBUS=y
+
diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
index c251b65ad368..6fe35139fa0b 100644
--- a/drivers/bus/Makefile
+++ b/drivers/bus/Makefile
@@ -9,5 +9,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += fslmc
 endif
 DIRS-$(CONFIG_RTE_LIBRTE_PCI_BUS) += pci
 DIRS-$(CONFIG_RTE_LIBRTE_VDEV_BUS) += vdev
+DIRS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile
new file mode 100644
index ..c4ca1129c7ea
--- /dev/null
+++ b/drivers/bus/vmbus/Makefile
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+LIB = librte_bus_vmbus.a
+LIBABIVER := 1
+EXPORT_MAP := rte_bus_vmbus_version.map
+
+CFLAGS += -I$(SRCDIR)
+CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),)
+SYSTEM := linux
+endif
+ifneq ($(CONFIG_RTE_EXEC_ENV_BSDAPP),)
+$(error "VMBUS not implemented for BSD yet")
+endif
+
+CFLAGS += -I$(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM)
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/$(SYSTEM)app/eal
+
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_vmbus -luuid
+
+include $(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM)/Makefile
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) := $(addprefix $(SYSTEM)/,$(SRCS))
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_channel.c vmbus_bufring.c
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_common_uio.c
+
+SYMLINK-$(CONFIG_RTE_LIBRTE_VMBUS)-include 

[dpdk-dev] [PATCH v2 2/4] net/netvsc: add hyper-v netvsc network device

2018-04-16 Thread Stephen Hemminger
From: Stephen Hemminger 

The driver supports Hyper-V networking directly like
virtio for KVM or vmxnet3 for VMware.

This code is based off of the FreeBSD driver. The file and variable
names are kept the same to help with understanding (with most of the
BSD style warts removed).

Signed-off-by: Stephen Hemminger 
---
 MAINTAINERS   |7 +
 config/common_base|8 +
 config/common_linuxapp|2 +-
 drivers/net/Makefile  |1 +
 drivers/net/netvsc/Makefile   |   23 +
 drivers/net/netvsc/hn_ethdev.c|  760 ++
 drivers/net/netvsc/hn_logs.h  |   35 +
 drivers/net/netvsc/hn_nvs.c   |  533 +++
 drivers/net/netvsc/hn_nvs.h   |  243 
 drivers/net/netvsc/hn_rndis.c | 1101 +++
 drivers/net/netvsc/hn_rndis.h |   26 +
 drivers/net/netvsc/hn_rxtx.c  | 1221 +
 drivers/net/netvsc/hn_var.h   |  140 ++
 drivers/net/netvsc/ndis.h |  378 +
 drivers/net/netvsc/rndis.h|  414 ++
 drivers/net/netvsc/rte_pmd_netvsc_version.map |5 +
 mk/rte.app.mk |1 +
 17 files changed, 4897 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/netvsc/Makefile
 create mode 100644 drivers/net/netvsc/hn_ethdev.c
 create mode 100644 drivers/net/netvsc/hn_logs.h
 create mode 100644 drivers/net/netvsc/hn_nvs.c
 create mode 100644 drivers/net/netvsc/hn_nvs.h
 create mode 100644 drivers/net/netvsc/hn_rndis.c
 create mode 100644 drivers/net/netvsc/hn_rndis.h
 create mode 100644 drivers/net/netvsc/hn_rxtx.c
 create mode 100644 drivers/net/netvsc/hn_var.h
 create mode 100644 drivers/net/netvsc/ndis.h
 create mode 100644 drivers/net/netvsc/rndis.h
 create mode 100644 drivers/net/netvsc/rte_pmd_netvsc_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 09d7f0e04618..086f38d73f07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -479,6 +479,13 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Hyper-V netvsc
+M: Stephen Hemminger 
+M: K. Y. Srinivasan" 
+M: Haiyang Zhang 
+F: drivers/net/hyperv/
+F: doc/guides/nics/hyperv.rst
+
 Intel e1000
 M: Wenzhuo Lu 
 T: git://dpdk.org/next/dpdk-next-net-intel
diff --git a/config/common_base b/config/common_base
index 695034db661b..4a6a3cf61a12 100644
--- a/config/common_base
+++ b/config/common_base
@@ -401,7 +401,15 @@ CONFIG_RTE_LIBRTE_MVPP2_PMD=n
 #
 CONFIG_RTE_LIBRTE_VMBUS=n
 
+#
+# Compile native PMD for Hyper-V/Azure
+#
+CONFIG_RTE_LIBRTE_NETVSC_PMD=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_DUMP=n
 
+#
 # Compile virtual device driver for NetVSC on Hyper-V/Azure
 #
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 30f24d0362c5..83577c75a161 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -40,4 +40,4 @@ CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC=y
 
 # Hyper-V Virtual Machine bus and drivers
 CONFIG_RTE_LIBRTE_VMBUS=y
-
+CONFIG_RTE_LIBRTE_NETVSC_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index dc5047e0491d..2cc58b6044b2 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -33,6 +33,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
 DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
 DIRS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mvpp2
+DIRS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += netvsc
 DIRS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp
 DIRS-$(CONFIG_RTE_LIBRTE_BNXT_PMD) += bnxt
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
diff --git a/drivers/net/netvsc/Makefile b/drivers/net/netvsc/Makefile
new file mode 100644
index ..3c713af3c8fc
--- /dev/null
+++ b/drivers/net/netvsc/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+LIB = librte_pmd_netvsc.a
+
+CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+EXPORT_MAP := rte_pmd_netvsc_version.map
+
+LIBABIVER := 1
+
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_rndis.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_nvs.c
+
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vmbus
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
new file mode 100644
index ..3ff69de392ba
--- /dev/null
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -0,0 +1,760 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2018 Microsoft Corporation
+ * Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#include 
+#inclu

[dpdk-dev] [PATCH v2 4/4] bus/vmbus and net/netvsc: add meson build support

2018-04-16 Thread Stephen Hemminger
Update meson build files for new netvsc and vmbus drivers.

Signed-off-by: Stephen Hemminger 
---
 config/meson.build |  7 +++
 drivers/bus/meson.build|  2 +-
 drivers/bus/vmbus/meson.build  | 17 +
 drivers/net/meson.build|  2 +-
 drivers/net/netvsc/meson.build | 13 +
 5 files changed, 39 insertions(+), 2 deletions(-)
 create mode 100644 drivers/bus/vmbus/meson.build
 create mode 100644 drivers/net/netvsc/meson.build

diff --git a/config/meson.build b/config/meson.build
index 77af5d897da2..a14213b86818 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -45,6 +45,13 @@ if host_machine.system() == 'linux' and 
cc.find_library('bsd', required: false).
dpdk_extra_ldflags += '-lbsd'
 endif
 
+# vmbus depends on lib uuid
+uuid_dep = cc.find_library('uuid', required: false)
+if uuid_dep.found() and cc.has_header('uuid/uuid.h')
+   add_project_link_arguments('-luuid', language: 'c')
+   dpdk_extra_ldflags += '-luuid'
+endif
+
 # add -include rte_config to cflags
 add_project_arguments('-include', 'rte_config.h', language: 'c')
 
diff --git a/drivers/bus/meson.build b/drivers/bus/meson.build
index 58dfbe2b24dd..720eecef0577 100644
--- a/drivers/bus/meson.build
+++ b/drivers/bus/meson.build
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-drivers = ['dpaa', 'fslmc', 'pci', 'vdev']
+drivers = ['dpaa', 'fslmc', 'pci', 'vdev', 'vmbus']
 std_deps = ['eal']
 config_flag_fmt = 'RTE_LIBRTE_@0@_BUS'
 driver_name_fmt = 'rte_bus_@0@'
diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build
new file mode 100644
index ..05700254fc34
--- /dev/null
+++ b/drivers/bus/vmbus/meson.build
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+install_headers('rte_bus_vmbus.h','rte_vmbus_reg.h')
+
+allow_experimental_apis = true
+sources = files('vmbus_common.c',
+   'vmbus_channel.c',
+   'vmbus_bufring.c',
+   'vmbus_common_uio.c')
+
+if host_machine.system() == 'linux'
+   sources += files('linux/vmbus_bus.c',
+   'linux/vmbus_uio.c')
+   includes += include_directories('linux')
+else
+   build = false
+endif
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index b7cac4a4a1f5..513c9d9e55c6 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -3,7 +3,7 @@
 
 drivers = ['af_packet', 'axgbe', 'bonding', 'dpaa', 'dpaa2',
'e1000', 'enic', 'fm10k', 'i40e', 'ixgbe',
-   'null', 'octeontx', 'pcap', 'ring',
+   'netvsc', 'null', 'octeontx', 'pcap', 'ring',
'sfc', 'thunderx', 'virtio']
 std_deps = ['ethdev', 'kvargs'] # 'ethdev' also pulls in mbuf, net, eal etc
 std_deps += ['bus_pci'] # very many PMDs depend on PCI, so make std
diff --git a/drivers/net/netvsc/meson.build b/drivers/net/netvsc/meson.build
new file mode 100644
index ..44926b20e973
--- /dev/null
+++ b/drivers/net/netvsc/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Microsoft Corporation
+
+version = 2
+
+allow_experimental_apis = true
+
+sources = files('hn_ethdev.c',
+   'hn_rxtx.c',
+   'hn_rndis.c',
+   'hn_nvs.c')
+
+deps += ['bus_vmbus' ]
-- 
2.17.0



[dpdk-dev] [PATCH v2 3/4] net/netvsc: add documentation

2018-04-16 Thread Stephen Hemminger
Matching documentation for new netvsc device.

Signed-off-by: Stephen Hemminger 
---
 doc/guides/nics/index.rst  |  1 +
 doc/guides/nics/netvsc.rst | 73 ++
 doc/guides/rel_notes/release_18_05.rst |  6 +++
 3 files changed, 80 insertions(+)
 create mode 100644 doc/guides/nics/netvsc.rst

diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index ea9110c81159..97727731375a 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -23,6 +23,7 @@ Network Interface Controller Drivers
 ena
 enic
 fm10k
+hyperv
 i40e
 igb
 ixgbe
diff --git a/doc/guides/nics/netvsc.rst b/doc/guides/nics/netvsc.rst
new file mode 100644
index ..db33772fb25b
--- /dev/null
+++ b/doc/guides/nics/netvsc.rst
@@ -0,0 +1,73 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) Microsoft Corporation.
+
+Poll Mode Driver for Hyper-V Network Virtual NIC
+
+
+Hyper-V is a hypervisor integrated into Window Server 2008, Windows 10
+and later versions.  It supports a para-virtualized network interface
+called netvsc that is visible on the virtual machine bus (VMBUS).  In
+the Data Plane Development Kit (DPDK), we provide a Network Virtual
+Service Client (NetVSC) Poll Mode Driver (PMD). The NetVSC PMD
+supports Windows Server 2016 and Microsoft Azure cloud.
+
+NetVSC Implementation in DPDK
+-
+
+The Netvsc PMD is a standalone driver. VMBus network devices that are
+being used by DPDK must be unbound from the Linux kernel driver
+(hv_netvsc) and bound to the Userspace IO driver for Hyper-V
+(uio_hv_generic).
+
+This is most conveniently done with the
+`driverctl ` script.
+The kernel must be version 5.0 or later to allow driver_override
+to work.
+
+To list all vmbus network devices
+  # driverctl -b vmbus -v list-devices | grep netvsc
+
+  # driverctl -b vmbus set-override  uio_hv_generic
+
+To determine the guid associated with a particular existing Ethernet
+device use:
+  $ basename $(readlink /sys/class/net/ethN/device)
+
+
+Features and Limitations of Hyper-V PMD
+---
+
+In this release, the hyper PMD driver provides the basic functionality of 
packet reception and transmission.
+
+*   It supports merge-able buffers per packet when receiving packets and 
scattered buffer per packet
+when transmitting packets. The packet size supported is from 64 to 65536.
+
+*   It supports multicast packets and promiscuous mode. In order to this to 
work, the guest network
+configuration on Hyper-V must be configured to allow this as well.
+
+*   Hyper-V driver does not support MAC or VLAN filtering because the host 
does not support it.
+The device has only a single MAC address.
+
+*   VLAN tags are always stripped and presented in mbuf tci field.
+
+*   The Hyper-V driver does not use or support Link State or Rx interrupt.
+
+*   The number of queues is limited by the host (currently 64).
+When used with 4.16 kernel only a single queue is available.
+
+*   This driver is intended for use with synthetic path only.
+Accelerated Networking (SR-IOV) acceleration is not supported yet.
+Use the VDEV_NETVSC device for accelerated networking instead.
+
+
+Prerequisites
+-
+
+The following prerequisites apply:
+
+*   Linux kernel support for UIO on vmbus is done with the uio_hv_generic 
driver.
+This driver was originally added in 4.14 kernel, but that version lacks 
necessary
+features for networking. The 4.16 kernel will work but is limited to a 
single queue.
+Supporting multiple queues (subchannels) required additional changes
+which were added in 5.0.
+
diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index bc9cdda6af7a..91e7e8c9551b 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -115,6 +115,12 @@ New Features
 
   Linux uevent is supported as backend of this device event notification 
framework.
 
+* **Added experimental support for Hyper-V netvsc PMD.**
+
+  The new experimntal ``netvsc`` poll mode driver provides native support for
+  networking on Hyper-V. See the :doc:`../nics/netvsce` nic driver guide
+  for more details on this   new driver.
+
 
 API Changes
 ---
-- 
2.17.0



Re: [dpdk-dev] [PATCH] eal/ipc: fix missing ignore message name

2018-04-16 Thread Thomas Monjalon
13/04/2018 18:16, Tan, Jianfeng:
> 
> On 4/13/2018 11:55 PM, Anatoly Burakov wrote:
> > We are trying to notify sender that response from current process
> > should be ignored, but we didn't specify which request this response
> > was for. Fix by copying request name from the original message.
> >
> > Fixes: 579a4ccc345c ("eal: ignore IPC messages until init is complete")
> > Cc: anatoly.bura...@intel.com
> >
> > Signed-off-by: Anatoly Burakov 
> 
> Acked-by: Jianfeng Tan 

Applied, thanks





[dpdk-dev] [PATCH 00/14] bnxt patchset

2018-04-16 Thread Ajit Khaparde
patchset against dpdk-next-net.
Please apply.

Thanks

Ajit Khaparde (13):
  net/bnxt: set default log level to informational
  net/bnxt: set padding flags in Rx descriptor
  net/bnxt: fix bnxt_hwrm_vnic_alloc
  net/bnxt: fix incorrect ntuple flag setting
  net/bnxt: fix Rx checksum flags for tunnel frames
  net/bnxt: fix L2 filter cleanup
  net/bnxt: fix bnxt_flow_destroy
  net/bnxt: add code to determine the Tx COS queue
  net/bnxt: maintain rx_mbuf_alloc_fail per RxQ
  net/bnxt: reset l2_filter_id once filter is freed
  net/bnxt: free memory allocated for VF filters
  net/bnxt: use UINT64_MAX to initialize filter ids
  net/bnxt: avoid freeing mem_zone multiple times

Somnath Kotur (1):
  bnxt: add device ID for Stratus VF

 drivers/net/bnxt/bnxt.h|   3 +-
 drivers/net/bnxt/bnxt_ethdev.c |  13 +-
 drivers/net/bnxt/bnxt_filter.c |  28 +-
 drivers/net/bnxt/bnxt_hwrm.c   |  36 ++-
 drivers/net/bnxt/bnxt_hwrm.h   |   3 +
 drivers/net/bnxt/bnxt_ring.c   |  10 +-
 drivers/net/bnxt/bnxt_ring.h   |   4 +-
 drivers/net/bnxt/bnxt_rxq.c|   5 +-
 drivers/net/bnxt/bnxt_rxq.h|   2 +
 drivers/net/bnxt/bnxt_rxr.c|  10 +-
 drivers/net/bnxt/bnxt_rxr.h|  16 +-
 drivers/net/bnxt/bnxt_stats.c  |  10 +-
 drivers/net/bnxt/bnxt_txq.c|   4 +-
 drivers/net/bnxt/bnxt_txq.h|   1 +
 drivers/net/bnxt/hsi_struct_def_dpdk.h | 552 -
 15 files changed, 436 insertions(+), 261 deletions(-)

-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 03/14] net/bnxt: fix bnxt_hwrm_vnic_alloc

2018-04-16 Thread Ajit Khaparde
In bnxt_hwrm_vnic_alloc, use rte_cpu_to_le_32 while setting the flags.
Fixes: 2691827e82c0 ("net/bnxt: add HWRM VNIC alloc")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_hwrm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 6e6daf4f8..0100f7473 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -1165,7 +1165,8 @@ int bnxt_hwrm_vnic_alloc(struct bnxt *bp, struct 
bnxt_vnic_info *vnic)
HWRM_PREP(req, VNIC_ALLOC);
 
if (vnic->func_default)
-   req.flags = HWRM_VNIC_ALLOC_INPUT_FLAGS_DEFAULT;
+   req.flags =
+   rte_cpu_to_le_32(HWRM_VNIC_ALLOC_INPUT_FLAGS_DEFAULT);
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
 
HWRM_CHECK_RESULT();
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 04/14] net/bnxt: fix incorrect ntuple flag setting

2018-04-16 Thread Ajit Khaparde
We are wrongly setting the Rx path flag while creating the ntuple filter.
It needs to be set for L2 or Exact Match filters only.
Fixes: 5ef3b79fdfe6 ("net/bnxt: support flow filter ops")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_filter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index 96b382ba8..5f3154060 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -806,7 +806,8 @@ bnxt_validate_and_parse_flow(struct rte_eth_dev *dev,
if (rc != 0)
goto ret;
//Since we support ingress attribute only - right now.
-   filter->flags = HWRM_CFA_EM_FLOW_ALLOC_INPUT_FLAGS_PATH_RX;
+   if (filter->filter_type == HWRM_CFA_EM_FILTER)
+   filter->flags = HWRM_CFA_EM_FLOW_ALLOC_INPUT_FLAGS_PATH_RX;
 
switch (act->type) {
case RTE_FLOW_ACTION_TYPE_QUEUE:
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 01/14] net/bnxt: set default log level to informational

2018-04-16 Thread Ajit Khaparde
Set the default log level to RTE_LOG_INFO from RTE_LOG_NOTICE.

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 1d4ff54b7..b7aab65ab 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3472,7 +3472,7 @@ bnxt_init_log(void)
 {
bnxt_logtype_driver = rte_log_register("pmd.bnxt.driver");
if (bnxt_logtype_driver >= 0)
-   rte_log_set_level(bnxt_logtype_driver, RTE_LOG_NOTICE);
+   rte_log_set_level(bnxt_logtype_driver, RTE_LOG_INFO);
 }
 
 RTE_PMD_REGISTER_PCI(net_bnxt, bnxt_rte_pmd);
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 05/14] net/bnxt: fix Rx checksum flags for tunnel frames

2018-04-16 Thread Ajit Khaparde
Fix Rx checksum status for tunnel frames as seen by hardware.
Current code does not handle cases for tunnel frames correctly.

Fixes: 7ec39d8c524b ("net/bnxt: update status of Rx IP/L4 CKSUM")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_rxr.h | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_rxr.h b/drivers/net/bnxt/bnxt_rxr.h
index dd4ea5d1d..e8c47ca56 100644
--- a/drivers/net/bnxt/bnxt_rxr.h
+++ b/drivers/net/bnxt/bnxt_rxr.h
@@ -24,17 +24,25 @@
 #define BNXT_TPA_OUTER_L3_OFF(hdr_info)\
((hdr_info) & 0x1ff)
 
-#define RX_CMP_L4_CS_BITS  rte_cpu_to_le_32(RX_PKT_CMPL_FLAGS2_L4_CS_CALC)
+#define RX_CMP_L4_CS_BITS  \
+   rte_cpu_to_le_32(RX_PKT_CMPL_FLAGS2_L4_CS_CALC | \
+RX_PKT_CMPL_FLAGS2_T_L4_CS_CALC)
 
-#define RX_CMP_L4_CS_ERR_BITS  rte_cpu_to_le_32(RX_PKT_CMPL_ERRORS_L4_CS_ERROR)
+#define RX_CMP_L4_CS_ERR_BITS  \
+   rte_cpu_to_le_32(RX_PKT_CMPL_ERRORS_L4_CS_ERROR | \
+RX_PKT_CMPL_ERRORS_T_L4_CS_ERROR)
 
 #define RX_CMP_L4_CS_OK(rxcmp1)
\
(((rxcmp1)->flags2 & RX_CMP_L4_CS_BITS) &&  \
 !((rxcmp1)->errors_v2 & RX_CMP_L4_CS_ERR_BITS))
 
-#define RX_CMP_IP_CS_ERR_BITS  rte_cpu_to_le_32(RX_PKT_CMPL_ERRORS_IP_CS_ERROR)
+#define RX_CMP_IP_CS_ERR_BITS  \
+   rte_cpu_to_le_32(RX_PKT_CMPL_ERRORS_IP_CS_ERROR | \
+RX_PKT_CMPL_ERRORS_T_IP_CS_ERROR)
 
-#define RX_CMP_IP_CS_BITS  rte_cpu_to_le_32(RX_PKT_CMPL_FLAGS2_IP_CS_CALC)
+#define RX_CMP_IP_CS_BITS  \
+   rte_cpu_to_le_32(RX_PKT_CMPL_FLAGS2_IP_CS_CALC | \
+RX_PKT_CMPL_FLAGS2_T_IP_CS_CALC)
 
 #define RX_CMP_IP_CS_OK(rxcmp1)
\
(((rxcmp1)->flags2 & RX_CMP_IP_CS_BITS) &&  \
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 02/14] net/bnxt: set padding flags in Rx descriptor

2018-04-16 Thread Ajit Khaparde
Set the RX_PROD_PKT_BD_FLAGS_EOP_PAD in Rx buffer descriptors.
Fixes: 2eb53b134aae ("net/bnxt: add initial Rx code")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_rxr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_rxr.c b/drivers/net/bnxt/bnxt_rxr.c
index ebdac1ca2..d9b4d768d 100644
--- a/drivers/net/bnxt/bnxt_rxr.c
+++ b/drivers/net/bnxt/bnxt_rxr.c
@@ -727,7 +727,7 @@ int bnxt_init_one_rx_ring(struct bnxt_rx_queue *rxq)
if (rxq->rx_buf_use_size <= size)
size = rxq->rx_buf_use_size;
 
-   type = RX_PROD_PKT_BD_TYPE_RX_PROD_PKT;
+   type = RX_PROD_PKT_BD_TYPE_RX_PROD_PKT | RX_PROD_PKT_BD_FLAGS_EOP_PAD;
 
rxr = rxq->rx_ring;
ring = rxr->rx_ring_struct;
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 10/14] net/bnxt: reset l2_filter_id once filter is freed

2018-04-16 Thread Ajit Khaparde
The fw_l2_filter_id for a ntuple filter is needed only for the lifetime
of the ntuple filter. Once the filter is free, reset the field.
The associated l2_filter will be freed as a part of its own cleanup.

Fixes: 5ef3b79fdfe6 ("net/bnxt: support flow filter ops")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_hwrm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 3a326d4f5..c7a6157d9 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -3696,6 +3696,7 @@ int bnxt_hwrm_clear_ntuple_filter(struct bnxt *bp,
HWRM_UNLOCK();
 
filter->fw_ntuple_filter_id = -1;
+   filter->fw_l2_filter_id = UINT64_MAX;
 
return 0;
 }
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 06/14] net/bnxt: fix L2 filter cleanup

2018-04-16 Thread Ajit Khaparde
We are wrongly freeing up a filter in the driver while it is still
configured in the HW. This can cause incorrect L2 filter id to be
used for filters created subsequently.

This filter will be cleared on cleanup anyway.

Fixes: 5ef3b79fdfe6 ("net/bnxt: support flow filter ops")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_filter.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index 5f3154060..d28c04038 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -919,11 +919,6 @@ bnxt_validate_and_parse_flow(struct rte_eth_dev *dev,
goto ret;
}
 
-   if (filter1) {
-   bnxt_free_filter(bp, filter1);
-   filter1->fw_l2_filter_id = -1;
-   }
-
act = nxt_non_void_action(++act);
if (act->type != RTE_FLOW_ACTION_TYPE_END) {
rte_flow_error_set(error, EINVAL,
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 11/14] net/bnxt: free memory allocated for VF filters

2018-04-16 Thread Ajit Khaparde
Memory allocated to hold VF filter info is not being freed currently.
This can cause potential memory leak.
Fixes: 7a5b0874440e ("net/bnxt: support to add a VF MAC address")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_filter.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index 179349539..c92806b4f 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -131,6 +131,14 @@ void bnxt_free_filter_mem(struct bnxt *bp)
 
rte_free(bp->filter_info);
bp->filter_info = NULL;
+
+   for (i = 0; i < bp->pf.max_vfs; i++) {
+   STAILQ_FOREACH(filter, &bp->pf.vf_info[i].filter, next) {
+   rte_free(filter);
+   STAILQ_REMOVE(&bp->pf.vf_info[i].filter, filter,
+ bnxt_filter_info, next);
+   }
+   }
 }
 
 int bnxt_alloc_filter_mem(struct bnxt *bp)
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 12/14] net/bnxt: use UINT64_MAX to initialize filter ids

2018-04-16 Thread Ajit Khaparde
Use UINT64_MAX to initialize l2, ntuple, em filter_id fields
instead of hardcoded -1;

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_filter.c | 8 
 drivers/net/bnxt/bnxt_hwrm.c   | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index c92806b4f..9351460c2 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -68,9 +68,9 @@ void bnxt_init_filters(struct bnxt *bp)
STAILQ_INIT(&bp->free_filter_list);
for (i = 0; i < max_filters; i++) {
filter = &bp->filter_info[i];
-   filter->fw_l2_filter_id = -1;
-   filter->fw_em_filter_id = -1;
-   filter->fw_ntuple_filter_id = -1;
+   filter->fw_l2_filter_id = UINT64_MAX;
+   filter->fw_em_filter_id = UINT64_MAX;
+   filter->fw_ntuple_filter_id = UINT64_MAX;
STAILQ_INSERT_TAIL(&bp->free_filter_list, filter, next);
}
 }
@@ -963,7 +963,7 @@ bnxt_flow_validate(struct rte_eth_dev *dev,
ret = bnxt_validate_and_parse_flow(dev, pattern, actions, attr,
   error, filter);
/* No need to hold on to this filter if we are just validating flow */
-   filter->fw_l2_filter_id = -1;
+   filter->fw_l2_filter_id = UINT64_MAX;
bnxt_free_filter(bp, filter);
 
return ret;
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index c7a6157d9..11204bf42 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -317,7 +317,7 @@ int bnxt_hwrm_clear_l2_filter(struct bnxt *bp,
HWRM_CHECK_RESULT();
HWRM_UNLOCK();
 
-   filter->fw_l2_filter_id = -1;
+   filter->fw_l2_filter_id = UINT64_MAX;
 
return 0;
 }
@@ -3583,8 +3583,8 @@ int bnxt_hwrm_clear_em_filter(struct bnxt *bp, struct 
bnxt_filter_info *filter)
HWRM_CHECK_RESULT();
HWRM_UNLOCK();
 
-   filter->fw_em_filter_id = -1;
-   filter->fw_l2_filter_id = -1;
+   filter->fw_em_filter_id = UINT64_MAX;
+   filter->fw_l2_filter_id = UINT64_MAX;
 
return 0;
 }
@@ -3695,7 +3695,7 @@ int bnxt_hwrm_clear_ntuple_filter(struct bnxt *bp,
HWRM_CHECK_RESULT();
HWRM_UNLOCK();
 
-   filter->fw_ntuple_filter_id = -1;
+   filter->fw_ntuple_filter_id = UINT64_MAX;
filter->fw_l2_filter_id = UINT64_MAX;
 
return 0;
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 09/14] net/bnxt: maintain rx_mbuf_alloc_fail per RxQ

2018-04-16 Thread Ajit Khaparde
Currently we have a single counter for mbuf alloc failure.
Make it per RxQ instead.

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|  1 -
 drivers/net/bnxt/bnxt_ethdev.c |  1 -
 drivers/net/bnxt/bnxt_rxq.c|  1 +
 drivers/net/bnxt/bnxt_rxq.h|  1 +
 drivers/net/bnxt/bnxt_rxr.c|  8 
 drivers/net/bnxt/bnxt_stats.c  | 10 --
 6 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index d3eab8d36..bdca2622f 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -295,7 +295,6 @@ struct bnxt {
uint16_tgeneve_fw_dst_port_id;
uint32_tfw_ver;
uint32_thwrm_spec_code;
-   rte_atomic64_t  rx_mbuf_alloc_fail;
 
struct bnxt_led_infoleds[BNXT_MAX_LED];
uint8_t num_leds;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index b7aab65ab..3cf845089 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3129,7 +3129,6 @@ bnxt_dev_init(struct rte_eth_dev *eth_dev)
 
bp = eth_dev->data->dev_private;
 
-   rte_atomic64_init(&bp->rx_mbuf_alloc_fail);
bp->dev_stopped = 1;
 
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/bnxt/bnxt_rxq.c b/drivers/net/bnxt/bnxt_rxq.c
index ce3f0a1d9..d797a47e9 100644
--- a/drivers/net/bnxt/bnxt_rxq.c
+++ b/drivers/net/bnxt/bnxt_rxq.c
@@ -336,6 +336,7 @@ int bnxt_rx_queue_setup_op(struct rte_eth_dev *eth_dev,
rc = -ENOMEM;
goto out;
}
+   rte_atomic64_init(&rxq->rx_mbuf_alloc_fail);
 
 out:
return rc;
diff --git a/drivers/net/bnxt/bnxt_rxq.h b/drivers/net/bnxt/bnxt_rxq.h
index 616163e63..3350d7719 100644
--- a/drivers/net/bnxt/bnxt_rxq.h
+++ b/drivers/net/bnxt/bnxt_rxq.h
@@ -32,6 +32,7 @@ struct bnxt_rx_queue {
uint32_trx_buf_use_size;  /* useable size */
struct bnxt_rx_ring_info*rx_ring;
struct bnxt_cp_ring_info*cp_ring;
+   rte_atomic64_t  rx_mbuf_alloc_fail;
 };
 
 void bnxt_free_rxq_stats(struct bnxt_rx_queue *rxq);
diff --git a/drivers/net/bnxt/bnxt_rxr.c b/drivers/net/bnxt/bnxt_rxr.c
index d9b4d768d..4bc320430 100644
--- a/drivers/net/bnxt/bnxt_rxr.c
+++ b/drivers/net/bnxt/bnxt_rxr.c
@@ -41,7 +41,7 @@ static inline int bnxt_alloc_rx_data(struct bnxt_rx_queue 
*rxq,
 
mbuf = __bnxt_alloc_rx_data(rxq->mb_pool);
if (!mbuf) {
-   rte_atomic64_inc(&rxq->bp->rx_mbuf_alloc_fail);
+   rte_atomic64_inc(&rxq->rx_mbuf_alloc_fail);
return -ENOMEM;
}
 
@@ -62,7 +62,7 @@ static inline int bnxt_alloc_ag_data(struct bnxt_rx_queue 
*rxq,
 
mbuf = __bnxt_alloc_rx_data(rxq->mb_pool);
if (!mbuf) {
-   rte_atomic64_inc(&rxq->bp->rx_mbuf_alloc_fail);
+   rte_atomic64_inc(&rxq->rx_mbuf_alloc_fail);
return -ENOMEM;
}
 
@@ -299,7 +299,7 @@ static inline struct rte_mbuf *bnxt_tpa_end(
struct rte_mbuf *new_data = __bnxt_alloc_rx_data(rxq->mb_pool);
RTE_ASSERT(new_data != NULL);
if (!new_data) {
-   rte_atomic64_inc(&rxq->bp->rx_mbuf_alloc_fail);
+   rte_atomic64_inc(&rxq->rx_mbuf_alloc_fail);
return NULL;
}
tpa_info->mbuf = new_data;
@@ -767,7 +767,7 @@ int bnxt_init_one_rx_ring(struct bnxt_rx_queue *rxq)
rxr->tpa_info[i].mbuf =
__bnxt_alloc_rx_data(rxq->mb_pool);
if (!rxr->tpa_info[i].mbuf) {
-   rte_atomic64_inc(&rxq->bp->rx_mbuf_alloc_fail);
+   rte_atomic64_inc(&rxq->rx_mbuf_alloc_fail);
return -ENOMEM;
}
}
diff --git a/drivers/net/bnxt/bnxt_stats.c b/drivers/net/bnxt/bnxt_stats.c
index 5a1c07388..1b586f333 100644
--- a/drivers/net/bnxt/bnxt_stats.c
+++ b/drivers/net/bnxt/bnxt_stats.c
@@ -221,6 +221,8 @@ int bnxt_stats_get_op(struct rte_eth_dev *eth_dev,
 bnxt_stats, 1);
if (unlikely(rc))
return rc;
+   bnxt_stats->rx_nombuf +=
+   rte_atomic64_read(&rxq->rx_mbuf_alloc_fail);
}
 
for (i = 0; i < bp->tx_cp_nr_rings; i++) {
@@ -235,13 +237,13 @@ int bnxt_stats_get_op(struct rte_eth_dev *eth_dev,
rc = bnxt_hwrm_func_qstats(bp, 0x, bnxt_stats);
if (unlikely(rc))
return rc;
-   bnxt_stats->rx_nombuf = rte_atomic64_read(&bp->rx_mbuf_alloc_fail);
return rc;
 }
 
 void bnxt_stats_reset_op(struct rte_eth_dev *eth_dev)
 {
struct bnxt *bp = (struct bnxt *)eth_dev->data->dev_private;
+   unsigned int i;
 
if (!(bp->flags & BNXT_FLAG_INIT_DONE)) {

[dpdk-dev] [PATCH 07/14] net/bnxt: fix bnxt_flow_destroy

2018-04-16 Thread Ajit Khaparde
bnxt_hwrm_clear_l2_filter needs to be called only if the filter type
is L2 and not otherwise.
Also check for the return value of bnxt_hwrm_clear_l2_filter().

Fixes: 5ef3b79fdfe6 ("net/bnxt: support flow filter ops")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index d28c04038..179349539 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -1144,8 +1144,8 @@ bnxt_flow_destroy(struct rte_eth_dev *dev,
ret = bnxt_hwrm_clear_em_filter(bp, filter);
if (filter->filter_type == HWRM_CFA_NTUPLE_FILTER)
ret = bnxt_hwrm_clear_ntuple_filter(bp, filter);
-
-   bnxt_hwrm_clear_l2_filter(bp, filter);
+   else
+   ret = bnxt_hwrm_clear_l2_filter(bp, filter);
if (!ret) {
STAILQ_REMOVE(&vnic->flow_list, flow, rte_flow, next);
rte_free(flow);
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 08/14] net/bnxt: add code to determine the Tx COS queue

2018-04-16 Thread Ajit Khaparde
The hwrm_queue_qportcfg command has been extended to determine
the COS queue that a Tx ring needs to use. This patch adds code
to determine the information from the FW and use it while
creating the Tx rings.

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|   2 +
 drivers/net/bnxt/bnxt_hwrm.c   |  24 +-
 drivers/net/bnxt/bnxt_hwrm.h   |   3 +
 drivers/net/bnxt/hsi_struct_def_dpdk.h | 552 -
 4 files changed, 361 insertions(+), 220 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 2e99878ef..d3eab8d36 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -272,6 +272,7 @@ struct bnxt {
 
struct bnxt_link_info   link_info;
struct bnxt_cos_queue_info  cos_queue[BNXT_COS_QUEUE_COUNT];
+   uint8_t tx_cosq_id;
 
uint16_tfw_fid;
uint8_t dflt_mac_addr[ETHER_ADDR_LEN];
@@ -293,6 +294,7 @@ struct bnxt {
uint16_tvxlan_fw_dst_port_id;
uint16_tgeneve_fw_dst_port_id;
uint32_tfw_ver;
+   uint32_thwrm_spec_code;
rte_atomic64_t  rx_mbuf_alloc_fail;
 
struct bnxt_led_infoleds[BNXT_MAX_LED];
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 0100f7473..3a326d4f5 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -27,6 +27,7 @@
 #include 
 
 #define HWRM_CMD_TIMEOUT   1
+#define HWRM_VERSION_1_9_1 0x10901
 
 struct bnxt_plcmodes_cfg {
uint32_tflags;
@@ -665,6 +666,7 @@ int bnxt_hwrm_ver_get(struct bnxt *bp)
fw_version = resp->hwrm_intf_maj << 16;
fw_version |= resp->hwrm_intf_min << 8;
fw_version |= resp->hwrm_intf_upd;
+   bp->hwrm_spec_code = fw_version;
 
if (resp->hwrm_intf_maj != HWRM_VERSION_MAJOR) {
PMD_DRV_LOG(ERR, "Unsupported firmware API version\n");
@@ -891,9 +893,15 @@ int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
int rc = 0;
struct hwrm_queue_qportcfg_input req = {.req_type = 0 };
struct hwrm_queue_qportcfg_output *resp = bp->hwrm_cmd_resp_addr;
+   int i;
 
HWRM_PREP(req, QUEUE_QPORTCFG);
 
+   req.flags = HWRM_QUEUE_QPORTCFG_INPUT_FLAGS_PATH_TX;
+   /* HWRM Version >= 1.9.1 */
+   if (bp->hwrm_spec_code >= HWRM_VERSION_1_9_1)
+   req.drv_qmap_cap =
+   HWRM_QUEUE_QPORTCFG_INPUT_DRV_QMAP_CAP_ENABLED;
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
 
HWRM_CHECK_RESULT();
@@ -913,6 +921,20 @@ int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
 
HWRM_UNLOCK();
 
+   if (bp->hwrm_spec_code < HWRM_VERSION_1_9_1) {
+   bp->tx_cosq_id = bp->cos_queue[0].id;
+   } else {
+   /* iterate and find the COSq profile to use for Tx */
+   for (i = 0; i < BNXT_COS_QUEUE_COUNT; i++) {
+   if (bp->cos_queue[i].profile ==
+   HWRM_QUEUE_SERVICE_PROFILE_LOSSY) {
+   bp->tx_cosq_id = bp->cos_queue[i].id;
+   break;
+   }
+   }
+   }
+   PMD_DRV_LOG(DEBUG, "Tx Cos Queue to use: %d\n", bp->tx_cosq_id);
+
return rc;
 }
 
@@ -936,7 +958,7 @@ int bnxt_hwrm_ring_alloc(struct bnxt *bp,
 
switch (ring_type) {
case HWRM_RING_ALLOC_INPUT_RING_TYPE_TX:
-   req.queue_id = bp->cos_queue[0].id;
+   req.queue_id = rte_cpu_to_le_16(bp->tx_cosq_id);
/* FALLTHROUGH */
case HWRM_RING_ALLOC_INPUT_RING_TYPE_RX:
req.ring_type = ring_type;
diff --git a/drivers/net/bnxt/bnxt_hwrm.h b/drivers/net/bnxt/bnxt_hwrm.h
index 629243477..7c161eea0 100644
--- a/drivers/net/bnxt/bnxt_hwrm.h
+++ b/drivers/net/bnxt/bnxt_hwrm.h
@@ -26,6 +26,9 @@ struct bnxt_cp_ring_info;
 #define ASYNC_CMPL_EVENT_ID_VF_CFG_CHANGE  \
(1 << (HWRM_ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE - 32))
 
+#define HWRM_QUEUE_SERVICE_PROFILE_LOSSY \
+   HWRM_QUEUE_QPORTCFG_OUTPUT_QUEUE_ID0_SERVICE_PROFILE_LOSSY
+
 int bnxt_hwrm_cfa_l2_clear_rx_mask(struct bnxt *bp,
   struct bnxt_vnic_info *vnic);
 int bnxt_hwrm_cfa_l2_set_rx_mask(struct bnxt *bp, struct bnxt_vnic_info *vnic,
diff --git a/drivers/net/bnxt/hsi_struct_def_dpdk.h 
b/drivers/net/bnxt/hsi_struct_def_dpdk.h
index bcdacae81..79705a7da 100644
--- a/drivers/net/bnxt/hsi_struct_def_dpdk.h
+++ b/drivers/net/bnxt/hsi_struct_def_dpdk.h
@@ -6759,339 +6759,453 @@ struct hwrm_port_led_qcaps_output {
  * configured.
  */
 /* Input   (24 bytes) */
+/* hwrm_queue_qportcfg_input (size:192b/24B) */
 struct hwrm_queue_qportcfg_input {
-   uint16_t req_type;
+   /* The HWRM command request type. */
+   uint16_treq_type;
/*
-   

[dpdk-dev] [PATCH 14/14] bnxt: add device ID for Stratus VF

2018-04-16 Thread Ajit Khaparde
From: Somnath Kotur 

Fixes: 1cd45aeb3270 ("net/bnxt: support Stratus VF device")
Cc: ajit.khapa...@broadcom.com
Signed-off-by: Somnath Kotur 
---
 drivers/net/bnxt/bnxt_ethdev.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 3cf845089..7632c326b 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -34,7 +34,8 @@ int bnxt_logtype_driver;
 
 #define PCI_VENDOR_ID_BROADCOM 0x14E4
 
-#define BROADCOM_DEV_ID_STRATUS_NIC_VF 0x1609
+#define BROADCOM_DEV_ID_STRATUS_NIC_VF1 0x1606
+#define BROADCOM_DEV_ID_STRATUS_NIC_VF2 0x1609
 #define BROADCOM_DEV_ID_STRATUS_NIC 0x1614
 #define BROADCOM_DEV_ID_57414_VF 0x16c1
 #define BROADCOM_DEV_ID_57301 0x16c8
@@ -75,7 +76,9 @@ int bnxt_logtype_driver;
 
 static const struct rte_pci_id bnxt_pci_id_map[] = {
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_BROADCOM,
-BROADCOM_DEV_ID_STRATUS_NIC_VF) },
+BROADCOM_DEV_ID_STRATUS_NIC_VF1) },
+   { RTE_PCI_DEVICE(PCI_VENDOR_ID_BROADCOM,
+BROADCOM_DEV_ID_STRATUS_NIC_VF2) },
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_BROADCOM, BROADCOM_DEV_ID_STRATUS_NIC) },
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_BROADCOM, BROADCOM_DEV_ID_57414_VF) },
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_BROADCOM, BROADCOM_DEV_ID_57301) },
@@ -3063,7 +3066,8 @@ static bool bnxt_vf_pciid(uint16_t id)
id == BROADCOM_DEV_ID_5731X_VF ||
id == BROADCOM_DEV_ID_5741X_VF ||
id == BROADCOM_DEV_ID_57414_VF ||
-   id == BROADCOM_DEV_ID_STRATUS_NIC_VF)
+   id == BROADCOM_DEV_ID_STRATUS_NIC_VF1 ||
+   id == BROADCOM_DEV_ID_STRATUS_NIC_VF2)
return true;
return false;
 }
-- 
2.15.1 (Apple Git-101)



[dpdk-dev] [PATCH 13/14] net/bnxt: avoid freeing mem_zone multiple times

2018-04-16 Thread Ajit Khaparde
Since we are storing the mem_zone address for each ring created,
we are freeing the same address multiple times.
For example the memory zone created for Rx is being freed during
Rx ring cleanup, AGG ring cleanup and CQ cleanup.
Avoid this by storing the memory zone address in RXQ instead and
free it as a part of queue_release dev_op.
In the same way do the same for TX queues as well.

Fixes: 51c87ebafc7d ("net/bnxt: add Tx queue create/destroy")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ring.c | 10 +++---
 drivers/net/bnxt/bnxt_ring.h |  4 ++--
 drivers/net/bnxt/bnxt_rxq.c  |  4 +++-
 drivers/net/bnxt/bnxt_rxq.h  |  1 +
 drivers/net/bnxt/bnxt_txq.c  |  4 +++-
 drivers/net/bnxt/bnxt_txq.h  |  1 +
 6 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c
index 4998c610a..478eab4f3 100644
--- a/drivers/net/bnxt/bnxt_ring.c
+++ b/drivers/net/bnxt/bnxt_ring.c
@@ -28,7 +28,7 @@ void bnxt_free_ring(struct bnxt_ring *ring)
memset((char *)*ring->vmem, 0, ring->vmem_size);
*ring->vmem = NULL;
}
-   rte_memzone_free((const struct rte_memzone *)ring->mem_zone);
+   ring->mem_zone = NULL;
 }
 
 /*
@@ -61,12 +61,14 @@ int bnxt_init_ring_grps(struct bnxt *bp)
  * rx bd ring - Only non-zero length if rx_ring_info is not NULL
  */
 int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx,
-   struct bnxt_tx_ring_info *tx_ring_info,
-   struct bnxt_rx_ring_info *rx_ring_info,
+   struct bnxt_tx_queue *txq,
+   struct bnxt_rx_queue *rxq,
struct bnxt_cp_ring_info *cp_ring_info,
const char *suffix)
 {
struct bnxt_ring *cp_ring = cp_ring_info->cp_ring_struct;
+   struct bnxt_rx_ring_info *rx_ring_info = rxq ? rxq->rx_ring : NULL;
+   struct bnxt_tx_ring_info *tx_ring_info = txq ? txq->tx_ring : NULL;
struct bnxt_ring *tx_ring;
struct bnxt_ring *rx_ring;
struct rte_pci_device *pdev = bp->pdev;
@@ -165,6 +167,7 @@ int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx,
}
 
if (tx_ring_info) {
+   txq->mz = mz;
tx_ring = tx_ring_info->tx_ring_struct;
 
tx_ring->bd = ((char *)mz->addr + tx_ring_start);
@@ -184,6 +187,7 @@ int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx,
}
 
if (rx_ring_info) {
+   rxq->mz = mz;
rx_ring = rx_ring_info->rx_ring_struct;
 
rx_ring->bd = ((char *)mz->addr + rx_ring_start);
diff --git a/drivers/net/bnxt/bnxt_ring.h b/drivers/net/bnxt/bnxt_ring.h
index d70eb64de..6c86259e8 100644
--- a/drivers/net/bnxt/bnxt_ring.h
+++ b/drivers/net/bnxt/bnxt_ring.h
@@ -65,8 +65,8 @@ struct bnxt_cp_ring_info;
 void bnxt_free_ring(struct bnxt_ring *ring);
 int bnxt_init_ring_grps(struct bnxt *bp);
 int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx,
-   struct bnxt_tx_ring_info *tx_ring_info,
-   struct bnxt_rx_ring_info *rx_ring_info,
+   struct bnxt_tx_queue *txq,
+   struct bnxt_rx_queue *rxq,
struct bnxt_cp_ring_info *cp_ring_info,
const char *suffix);
 int bnxt_alloc_hwrm_rings(struct bnxt *bp);
diff --git a/drivers/net/bnxt/bnxt_rxq.c b/drivers/net/bnxt/bnxt_rxq.c
index d797a47e9..e939c9ac0 100644
--- a/drivers/net/bnxt/bnxt_rxq.c
+++ b/drivers/net/bnxt/bnxt_rxq.c
@@ -267,6 +267,8 @@ void bnxt_rx_queue_release_op(void *rx_queue)
bnxt_free_ring(rxq->cp_ring->cp_ring_struct);
 
bnxt_free_rxq_stats(rxq);
+   rte_memzone_free(rxq->mz);
+   rxq->mz = NULL;
 
rte_free(rxq);
}
@@ -328,7 +330,7 @@ int bnxt_rx_queue_setup_op(struct rte_eth_dev *eth_dev,
 
eth_dev->data->rx_queues[queue_idx] = rxq;
/* Allocate RX ring hardware descriptors */
-   if (bnxt_alloc_rings(bp, queue_idx, NULL, rxq->rx_ring, rxq->cp_ring,
+   if (bnxt_alloc_rings(bp, queue_idx, NULL, rxq, rxq->cp_ring,
"rxr")) {
PMD_DRV_LOG(ERR,
"ring_dma_zone_reserve for rx_ring failed!\n");
diff --git a/drivers/net/bnxt/bnxt_rxq.h b/drivers/net/bnxt/bnxt_rxq.h
index 3350d7719..8307f603c 100644
--- a/drivers/net/bnxt/bnxt_rxq.h
+++ b/drivers/net/bnxt/bnxt_rxq.h
@@ -33,6 +33,7 @@ struct bnxt_rx_queue {
struct bnxt_rx_ring_info*rx_ring;
struct bnxt_cp_ring_info*cp_ring;
rte_atomic64_t  rx_mbuf_alloc_fail;
+   const struct rte_memzone *mz;
 };
 
 void bnxt_free_rxq_stats(struct bnxt_rx_queue *rxq);
diff --git a/drivers/net/bnxt/bnxt_txq.c b/drivers/net/bnxt/bnxt_txq.c
index 37531ea49..07e25d77b 100644
--- a/drivers/net/bnxt/bnxt_txq.c
+++ b/drivers/ne

Re: [dpdk-dev] [PATCH v6 51/70] mem: add support for mapping hugepages at runtime

2018-04-16 Thread Yongseok Koh

> On Apr 11, 2018, at 5:30 AM, Anatoly Burakov  
> wrote:
> 
> Nothing uses this code yet. The bulk of it is copied from old
> memory allocation code (linuxapp eal_memory.c). We provide an
> EAL-internal API to allocate either one page or multiple pages,
> guaranteeing that we'll get contiguous VA for all of the pages
> that we requested.
> 
> Not supported on FreeBSD.
> 
> Locking is done via fcntl() because that way, when it comes to
> taking out write locks or unlocking on deallocation, we don't
> have to keep original fd's around. Plus, using fcntl() gives us
> ability to lock parts of a file, which is useful for single-file
> segments, which are coming down the line.
> 
> Signed-off-by: Anatoly Burakov 
> Tested-by: Santosh Shukla 
> Tested-by: Hemant Agrawal 
> Tested-by: Gowrishankar Muthukrishnan 
> ---
[...]
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c 
> b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> new file mode 100644
> index 000..45ea0ad
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> @@ -0,0 +1,429 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2017-2018 Intel Corporation
> + */
> +
> +#define _FILE_OFFSET_BITS 64
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

There's a bug in the old RedHat release:
Bug 1476120 - glibc headers don't include linux/falloc.h, and therefore doesn't 
include fallocate() flags [1]

How about adding "#include " ahead of fcntl.h?  I'm quite lazy
to update my host and using CentOS 7.2.1511, then it failed to compile due to
this bug.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1476120

Thanks,
Yongseok

Re: [dpdk-dev] [PATCH v2 0/5] allow procinfo and pdump on eth vdev

2018-04-16 Thread Zhang, Qi Z


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Friday, April 6, 2018 1:45 AM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tan, Jianfeng 
> Subject: [dpdk-dev] [PATCH v2 0/5] allow procinfo and pdump on eth vdev
> 
> v2:
>   - Add spinlock for vdev device list as suggested by Anatoly.
>   - Add ring, cxgbe and remove the free in each PMDs as suggested by
> Matan.
>   - Rebase on master.
> 
> As we know, we have below limitations in vdev:
>   - dpdk-procinfo cannot get the stats of (most) vdev in primary process;
>   - dpdk-pdump cannot dump the packets for (most) vdev in primary proces;
>   - secondary process cannot use (most) vdev in primary process.
> 
> The very first reason is that the secondary process actually does not know the
> existence of those vdevs as vdevs are chained on a linked list, and not
> shareable to secondary.
> 
> In this patch series, we would like to propose a vdev sharing model like this:
>   - As a secondary process boots, all devices (including vdev) in primary
> will be automatically shared. After both primary and secondary process
> booted,
>   - Device add/remove in primary will be translated to device hog
> plug/unplug
> event in secondary processes. (TODO)
>   - Device add in secondary
> * If that kind of device support multi-process, the secondary will
>   request the primary to probe the device and the primary to share
>   it to the secondary. It's not necessary to have secondary-private
>   device in this case. (TODO)
> * If that kind of device does not support multi-process, the secondary
>   will probe the device by itself, and the port id is shared among
>   all primary/secondary processes.
> 
> This patch series don't:
>   - provide secondary data path (Rx/Tx) support for each specific vdev.
> 
> How to test:
> 
> Step 0: start testpmd with a vhost port; and a VM connected to the vhost
> port.
> 
> Step 1: try using dpdk-procinfo to get the stats.
>  $(dpdk-procinfo) --log-level=8 --no-pci -- --stats
> 
> Step 2: try using dpdk-pdump to dump the packets.
>  $(dpdk-pdump) -- --pdump 'port=0,queue=*,rx-dev=/tmp/rx.pcap'
> 
> Jianfeng Tan (5):
>   eal: bring forward multi-process channel init
>   bus/vdev: add lock on vdev device list
>   bus/vdev: bus scan by multi-process channel
>   drivers/net: not use private eth dev data
>   drivers/net: share vdev data to secondary process
> 
>  drivers/bus/vdev/Makefile |   1 +
>  drivers/bus/vdev/vdev.c   | 187
> ++
>  drivers/net/af_packet/rte_eth_af_packet.c |  43 +++
>  drivers/net/bonding/rte_eth_bond_pmd.c|  13 +++
>  drivers/net/cxgbe/cxgbe_main.c|   1 -
>  drivers/net/failsafe/failsafe.c   |  14 +++
>  drivers/net/kni/rte_eth_kni.c |  26 +++--
>  drivers/net/null/rte_eth_null.c   |  32 ++---
>  drivers/net/octeontx/octeontx_ethdev.c|  29 ++---
>  drivers/net/pcap/rte_eth_pcap.c   |  32 ++---
>  drivers/net/ring/rte_eth_ring.c   |  17 +--
>  drivers/net/softnic/rte_eth_softnic.c |  19 ++-
>  drivers/net/tap/rte_eth_tap.c |  24 ++--
>  drivers/net/vhost/rte_eth_vhost.c |  36 +++---
>  lib/librte_eal/bsdapp/eal/eal.c   |  23 ++--
>  lib/librte_eal/linuxapp/eal/eal.c |  23 ++--
>  16 files changed, 354 insertions(+), 166 deletions(-)
> 
> --
> 2.7.4

Reviewed-by: Qi Zhang 

Also tested with af_packet and dpdk-procinfo / dpdk-dump, patches works as 
expected.

Regards
Qi




Re: [dpdk-dev] [PATCH v6 44/70] net/mlx5: use virt2memseg instead of iteration

2018-04-16 Thread Yongseok Koh

> On Apr 11, 2018, at 5:30 AM, Anatoly Burakov  
> wrote:
> 
> Reduce dependency on internal details of EAL memory subsystem, and
> simplify code.
> 
> Signed-off-by: Anatoly Burakov 
> Tested-by: Santosh Shukla 
> Tested-by: Hemant Agrawal 
> Tested-by: Gowrishankar Muthukrishnan 
> ---
> drivers/net/mlx5/mlx5_mr.c | 19 ---
> 1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 58afeb7..c96e134 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -234,10 +234,9 @@ struct mlx5_mr *
> mlx5_mr_new(struct rte_eth_dev *dev, struct rte_mempool *mp)
> {
>   struct priv *priv = dev->data->dev_private;
> - const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> + const struct rte_memseg *ms;
>   uintptr_t start;
>   uintptr_t end;
> - unsigned int i;
>   struct mlx5_mr *mr;
> 
>   mr = rte_zmalloc_socket(__func__, sizeof(*mr), 0, mp->socket_id);
> @@ -261,17 +260,15 @@ mlx5_mr_new(struct rte_eth_dev *dev, struct rte_mempool 
> *mp)
>   /* Save original addresses for exact MR lookup. */
>   mr->start = start;
>   mr->end = end;
> +
>   /* Round start and end to page boundary if found in memory segments. */
> - for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
> - uintptr_t addr = (uintptr_t)ms[i].addr;
> - size_t len = ms[i].len;
> - unsigned int align = ms[i].hugepage_sz;
> + ms = rte_mem_virt2memseg((void *)start);
> + if (ms != NULL)
> + start = RTE_ALIGN_FLOOR(start, ms->hugepage_sz);
> + ms = rte_mem_virt2memseg((void *)end);
> + if (ms != NULL)
> + end = RTE_ALIGN_CEIL(end, ms->hugepage_sz);

It is buggy. The memory region is [start, end), so if the memseg of 'end' isn't
allocated yet, the returned ms will have zero entries and this will make 'end'
zero. Instead, the following will be fine.

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index fdf7b3e88..39bbe2481 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -265,9 +265,7 @@ mlx5_mr_new(struct rte_eth_dev *dev, struct rte_mempool *mp)
ms = rte_mem_virt2memseg((void *)start, NULL);
if (ms != NULL)
start = RTE_ALIGN_FLOOR(start, ms->hugepage_sz);
-   ms = rte_mem_virt2memseg((void *)end, NULL);
-   if (ms != NULL)
-   end = RTE_ALIGN_CEIL(end, ms->hugepage_sz);
+   end = RTE_ALIGN_CEIL(end, ms->hugepage_sz);

DRV_LOG(DEBUG,
"port %u mempool %p using start=%p end=%p size=%zu for memory"

Same for mlx4. Please fix both mlx5 and mlx4 so that we can verify the new 
design.

However, this code block will be removed eventually. I've done a patchset to
accommodate your memory hotplug design and I'll send it out soon.


Thanks in advance.
Yongseok

> - if ((start > addr) && (start < addr + len))
> - start = RTE_ALIGN_FLOOR(start, align);
> - if ((end > addr) && (end < addr + len))
> - end = RTE_ALIGN_CEIL(end, align);
> - }
>   DRV_LOG(DEBUG,
>   "port %u mempool %p using start=%p end=%p size=%zu for memory"
>   " region",
> -- 
> 2.7.4



[dpdk-dev] [PATCH v3] drivers/net/i40e: fix missing promiscuous disable at device disable

2018-04-16 Thread Rosen Xu
v3 updates:
===
 - Move modification from device close to device disable
 - i40evf_reset_vf() will cause kernel driver enable all vlan promiscuous,
   so unicast/multicast promiscuous disable should set before reset.

v2 updates:
===
 - Add more comments

In scenario of Kernel Driver runs on PF and PMD runs on VF, PMD exit
doesn't disable promiscuous mode, this will cause vlan filter set by
Kernel Driver will not take effect.

This patch will fix it, add promiscuous disable at device disable.

Signed-off-by: Rosen Xu 
---
 drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 031c706..40012b0 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -2288,6 +2288,8 @@ static int eth_i40evf_pci_remove(struct rte_pci_device 
*pci_dev)
 
i40evf_dev_stop(dev);
i40e_dev_free_queues(dev);
+   i40evf_dev_promiscuous_disable(dev);
+   i40evf_dev_allmulticast_disable(dev);
i40evf_reset_vf(hw);
i40e_shutdown_adminq(hw);
/* disable uio intr before callback unregister */
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v2] drivers/net/i40e/i40e_ethdev_vf.c: fix missing promiscuous disable at device stop

2018-04-16 Thread Xu, Rosen
Hi Helin,

> -Original Message-
> From: Zhang, Helin
> Sent: Thursday, March 29, 2018 13:11
> To: Zhang, Qi Z ; Xu, Rosen ;
> Xing, Beilei 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2] drivers/net/i40e/i40e_ethdev_vf.c: fix
> missing promiscuous disable at device stop
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z
> > Sent: Friday, March 16, 2018 2:25 PM
> > To: Xu, Rosen; Xing, Beilei
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] drivers/net/i40e/i40e_ethdev_vf.c:
> > fix missing promiscuous disable at device stop
> >
> > Hi Rosen:
> >
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Rosen Xu
> > > Sent: Thursday, March 15, 2018 5:46 PM
> > > To: Xing, Beilei 
> > > Cc: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH v2] drivers/net/i40e/i40e_ethdev_vf.c:
> > > fix missing promiscuous disable at device stop
> > >
> > > In scenario of Kernel Driver runs on PF and PMD runs on VF, PMD exit
> > > doesn't disable promiscuous mode, this will cause vlan filter set by
> > > Kernel Driver will not take effect.
> > >
> > > This patch will fix it, add promiscuous disable at device stop.
> > >
> > > Signed-off-by: Rosen Xu 
> > > ---
> > >  drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/i40e/i40e_ethdev_vf.c
> > > b/drivers/net/i40e/i40e_ethdev_vf.c
> > > index fd003fe..f395b02 100644
> > > --- a/drivers/net/i40e/i40e_ethdev_vf.c
> > > +++ b/drivers/net/i40e/i40e_ethdev_vf.c
> > > @@ -2051,6 +2051,8 @@ static int eth_i40evf_pci_remove(struct
> > > rte_pci_device *pci_dev)
> > >
> > >   if (hw->adapter_stopped == 1)
> > >   return;
> > > + i40evf_dev_promiscuous_disable(dev);
> > > + i40evf_dev_allmulticast_disable(dev);
> >
> > Device's promiscuous mode is not expected to be changed in a
> > dev_start/dev_stop cycle Application need to call
> > rte_eth_promiscuous_disable and i40evf_dev_allmulticast_disable to
> > change it explicitly.
> >
> > Regards
> > Qi

This modification comes from some customers, they don't want to take this work 
in their application.

> Basically I'd reject your patch, based on the comments Qi made above.
> 
> /Helin

I have aligned with Jingjing and Qi, their proposal is to take this 
modification in vf Init, but after I have
checked kernel driver, I find if we don't disable promiscuous mode before any 
call i40evf_reset_vf(),
i40evf_reset_vf() will cause open promiscuous mode. So we only take this 
modification in dev_disable.

> >
> > >   i40evf_stop_queues(dev);
> > >   i40evf_disable_queues_intr(dev);
> > >   i40e_dev_clear_queues(dev);
> > > --
> > > 1.8.3.1



[dpdk-dev] [PATCH v3] kni: fix possible rx_q mbuf leaks and speed up alloc_q release

2018-04-16 Thread Yangchao Zhou
rx_q fifo can only be released by kernel thread. There may be
mbuf leaks in rx_q because  kernel threads are randomly stopped.

When the kni is released and netdev is unregisterd, convert the
physical address mbufs in rx_q to the virtual address in free_q.
By the way, alloc_q can be processed together to speed up the
release rate in userspace.

Signed-off-by: Yangchao Zhou 
Suggested-by: Ferruh Yigit 
---
 kernel/linux/kni/kni_dev.h  |1 +
 kernel/linux/kni/kni_misc.c |2 ++
 kernel/linux/kni/kni_net.c  |   40 
 3 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c9393d8..6275ef2 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -92,6 +92,7 @@ struct kni_dev {
void *alloc_va[MBUF_BURST_SZ];
 };
 
+void kni_net_release_fifo_phy(struct kni_dev *kni);
 void kni_net_rx(struct kni_dev *kni);
 void kni_net_init(struct net_device *dev);
 void kni_net_config_lo_mode(char *lo_str);
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 01574ec..fa69f8e 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -192,6 +192,8 @@ struct kni_net {
free_netdev(dev->net_dev);
}
 
+   kni_net_release_fifo_phy(dev);
+
return 0;
 }
 
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 9f9b798..1d64d78 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -163,6 +163,46 @@
return (ret == 0) ? req.result : ret;
 }
 
+static void
+kni_fifo_trans_pa2va(struct kni_dev *kni,
+   struct rte_kni_fifo *src_pa, struct rte_kni_fifo *dst_va)
+{
+   uint32_t ret, i, num_fq, num_rx;
+   void *kva;
+   do {
+   num_fq = kni_fifo_free_count(kni->free_q);
+   if (num_fq == 0)
+   return;
+
+   num_rx = min_t(uint32_t, num_fq, MBUF_BURST_SZ);
+
+   num_rx = kni_fifo_get(src_pa, kni->pa, num_rx);
+   if (num_rx == 0)
+   return;
+
+   for (i = 0; i < num_rx; i++) {
+   kva = pa2kva(kni->pa[i]);
+   kni->va[i] = pa2va(kni->pa[i], kva);
+   }
+
+   ret = kni_fifo_put(dst_va, kni->va, num_rx);
+   if (ret != num_rx) {
+   /* Failing should not happen */
+   pr_err("Fail to enqueue entries into dst_va\n");
+   return;
+   }
+   } while (1);
+}
+
+/* Try to release mbufs when kni release */
+void kni_net_release_fifo_phy(struct kni_dev *kni)
+{
+   /* release rx_q first, because it can't release in userspace */
+   kni_fifo_trans_pa2va(kni, kni->rx_q, kni->free_q);
+   /* release alloc_q for speeding up kni release in userspace */
+   kni_fifo_trans_pa2va(kni, kni->alloc_q, kni->free_q);
+}
+
 /*
  * Configuration changes (passed on by ifconfig)
  */
-- 
1.7.1



Re: [dpdk-dev] [PATCH v3 04/14] net/mlx5: support Rx tunnel type identification

2018-04-16 Thread Xueming(Steven) Li


> -Original Message-
> From: Adrien Mazarguil 
> Sent: Tuesday, April 17, 2018 12:03 AM
> To: Xueming(Steven) Li 
> Cc: Nélio Laranjeiro ; Shahaf Shuler 
> ; dev@dpdk.org;
> Olivier Matz 
> Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type identification
> 
> On Mon, Apr 16, 2018 at 03:27:37PM +, Xueming(Steven) Li wrote:
> >
> >
> > > -Original Message-
> > > From: Adrien Mazarguil 
> > > Sent: Monday, April 16, 2018 9:48 PM
> > > To: Xueming(Steven) Li 
> > > Cc: Nélio Laranjeiro ; Shahaf Shuler
> > > ; dev@dpdk.org; Olivier Matz
> > > 
> > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > > identification
> > >
> > > On Mon, Apr 16, 2018 at 01:32:49PM +, Xueming(Steven) Li wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Adrien Mazarguil 
> > > > > Sent: Monday, April 16, 2018 5:28 PM
> > > > > To: Xueming(Steven) Li 
> > > > > Cc: Nélio Laranjeiro ; Shahaf Shuler
> > > > > ; dev@dpdk.org; Olivier Matz
> > > > > 
> > > > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > > > > identification
> > > > >
> > > > > On Mon, Apr 16, 2018 at 08:05:13AM +, Xueming(Steven) Li wrote:
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Nélio Laranjeiro 
> > > > > > > Sent: Monday, April 16, 2018 3:29 PM
> > > > > > > To: Xueming(Steven) Li 
> > > > > > > Cc: Shahaf Shuler ; dev@dpdk.org;
> > > > > > > Olivier Matz ; Adrien Mazarguil
> > > > > > > 
> > > > > > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel
> > > > > > > type identification
> > > > > > >
> > > > > > > On Sat, Apr 14, 2018 at 12:57:58PM +, Xueming(Steven) Li 
> > > > > > > wrote:
> > > > > > > > +Adrien
> > > > > > > >
> > > > > > > > > -Original Message-
> > > > > > > > > From: Nélio Laranjeiro 
> > > > > > > > > Sent: Friday, April 13, 2018 9:03 PM
> > > > > > > > > To: Xueming(Steven) Li 
> > > > > > > > > Cc: Shahaf Shuler ; dev@dpdk.org;
> > > > > > > > > Olivier Matz 
> > > > > > > > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx
> > > > > > > > > tunnel type identification
> > > > > > > > >
> > > > > > > > > +Olivier,
> > > > > > > > >
> > > > > > > > > On Fri, Apr 13, 2018 at 07:20:13PM +0800, Xueming Li wrote:
> > > > > > > > > > This patch introduced tunnel type identification based on 
> > > > > > > > > > flow rules.
> > > > > > > > > > If flows of multiple tunnel types built on same queue,
> > > > > > > > > > RTE_PTYPE_TUNNEL_MASK will be returned, user
> > > > > > > > > > application could use bits in flow mark as tunnel type 
> > > > > > > > > > identifier.
> > > > > > > > >
> > > > > > > > > For an application it will mean the packet embed all
> > > > > > > > > tunnel types defined in DPDK, to make such thing you
> > > > > > > > > need a RTE_PTYPE_TUNNEL_UNKNOWN which does not exists 
> > > > > > > > > currently.
> > > > > > > >
> > > > > > > > There was a RTE_PTYPE_TUNNEL_UNKNOWN definition, but
> > > > > > > > removed due to
> > > > > > > discussion.
> > > > > > > > So I think it good to add it in the patchset of reviewed by 
> > > > > > > > Adrien.
> > > > > > >
> > > > > > > Agreed,
> > > > > > >
> > > > > > > >
> > > > > > > > > Even with it, the application still needs to parse the
> > > > > > > > > packet to discover which tunnel the packet embed, is
> > > > > > > > > there any benefit having such bit?  Not so sure.
> > > > > > > >
> > > > > > > > With a tunnel flag, checksum status represent inner checksum.
> > > > > > >
> > > > > > > Not sure this is generic enough, MLX5 behaves as this, but
> > > > > > > how behaves other NICs?  It should have specific bits for
> > > > > > > inner checksum if all NIC don't have the same behavior.
> > > > > >
> > > > > > From my understanding, if outer checksum invalid, the packet
> > > > > > can't be received as a tunneled packet, but a normal packet,
> > > > > > thus checksum flags always result of inner for a valid tunneled 
> > > > > > packet.
> > > > >
> > > > > Yes, since checksum validation information covers all layers at
> > > > > once (outermost to the innermost recognized), the presence of an 
> > > > > "unknown tunnel"
> > > > > bit implicitly means outer headers are OK.
> > > > >
> > > > > Now regarding the addition of RTE_PTYPE_TUNNEL_UNKNOWN, the main
> > > > > issue I see is that it's implicit, as in getting 0 after and'ing
> > > > > packet types with RTE_PTYPE_TUNNEL_MASK means either not present or 
> > > > > unknown type.
> > > >
> > > > How about define RTE_PTYPE_TUNNEL_UNKNOWN same ask
> > > > RTE_PTYPE_TUNNEL_MASK? And'ding packet types always return a non-zero 
> > > > value.
> > >
> > > I mean the value already exists, it's implicitly 0. Adding one with
> > > the same value as RTE_PTYPE_TUNNEL_MASK could be seen as a waste of
> > > a value otherwise usable for an actual tunnel type (there are only 4 
> > > bits).
> > >
> > > > > How about not setting any tunnel bit and let applications rely
> > > > > on the pr

Re: [dpdk-dev] [PATCH] drivers/net: update link status

2018-04-16 Thread Tiwei Bie
On Mon, Apr 16, 2018 at 05:10:24PM +0100, Ferruh Yigit wrote:
> On 4/14/2018 11:55 AM, Tiwei Bie wrote:
> > On Fri, Apr 13, 2018 at 10:53:55PM +0100, Ferruh Yigit wrote:
> >> On 4/10/2018 4:41 PM, Tiwei Bie wrote:
> >>> On Tue, Mar 13, 2018 at 06:05:34PM +, Ferruh Yigit wrote:
>  Update link status related feature document items and minor updates in
>  some link status related functions.
> 
>  Signed-off-by: Ferruh Yigit 
>  ---
>   doc/guides/nics/features/fm10k.ini  | 2 ++
>   doc/guides/nics/features/fm10k_vf.ini   | 2 ++
>   doc/guides/nics/features/i40e_vf.ini| 1 +
>   doc/guides/nics/features/igb_vf.ini | 1 +
>   doc/guides/nics/features/qede.ini   | 1 -
>   doc/guides/nics/features/qede_vf.ini| 1 -
>   doc/guides/nics/features/vhost.ini  | 2 --
>   doc/guides/nics/features/virtio_vec.ini | 1 +
>   drivers/net/e1000/em_ethdev.c   | 2 +-
>   drivers/net/ena/ena_ethdev.c| 2 +-
>   drivers/net/fm10k/fm10k_ethdev.c| 6 ++
>   drivers/net/i40e/i40e_ethdev_vf.c   | 2 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c| 2 +-
>   drivers/net/mlx4/mlx4_ethdev.c  | 2 +-
>   drivers/net/mlx5/mlx5_ethdev.c  | 2 +-
>   15 files changed, 15 insertions(+), 14 deletions(-)
> >>> [...]
>  diff --git a/doc/guides/nics/features/vhost.ini 
>  b/doc/guides/nics/features/vhost.ini
>  index dffd1f493..31302745a 100644
>  --- a/doc/guides/nics/features/vhost.ini
>  +++ b/doc/guides/nics/features/vhost.ini
>  @@ -4,8 +4,6 @@
>   ; Refer to default.ini for the full list of available PMD features.
>   ;
>   [Features]
>  -Link status  = Y
>  -Link status event= Y
> >>>
> >>> I think vhost PMD supports above features.
> >>
> >> I am not able to find where it is supported.
> >>
> >> Some virtual PMDs report fixed link, with empty link_update() dev_ops, and 
> >> they
> >> are not reported as supporting Link status, as far as I can see vhost also 
> >> one
> >> of them.
> >>
> >> And for Link status event, PMD needs to support LSC interrupts and should
> >> register interrupt handler for it, which I can't find for vhost.
> >>
> >> I will send next version without updating above one, please point me where 
> >> these
> >> support added if I missed them.
> > 
> > In drivers/net/vhost/rte_eth_vhost.c you could find below functions:
> > 
> > static int
> > new_device(int vid)
> > {
> > ..
> > 
> > eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> > 
> > ..
> > 
> > _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC, NULL);
> > 
> > ..
> > }
> > 
> > static void
> > destroy_device(int vid)
> > {
> > ..
> > 
> > eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
> > 
> > ..
> > 
> > _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC, NULL);
> > 
> > ..
> > }
> > 
> > They are the callbacks for vhost library.
> > 
> > When a frontend (e.g. QEMU) is connected to this vhost backend
> > and the frontend virtio device becomes ready, new_device() will
> > be called by the vhost library, and the link status will be
> > updated to UP.
> > 
> > And when e.g. the connection is closed, destroy_device() will be
> > called by the vhost library, and the link status will be updated
> > to DOWN.
> 
> 
> Got it. This behavior is similar for virtual PMDs. Provide static link
> information and update link as UP during start and update it as DOWN during 
> stop.

No, the link status isn't updated during vhost PMD start
and stop. When the vhost PMD has been started, the link
status still may be DOWN. The link status becomes UP only
when the QEMU (it's another virtual machine process which
has a virtio device) connects to this vhost PMD via a UNIX
socket and the virtio driver in the virtual machine has
setup the virtio device of the virtual machine.

So if vhost PMD reports the link status as DOWN, it means
there is no QEMU (virtual machine) connects to it or the
virtio device in the virtual machine hasn't been setup.
(PS. The frontend can also be virtio-user PMD besides QEMU)

Thanks

> 
> Other virtual PMDs doesn't report this feature, so removed from vhost as well
> for consistency.
> 
> > 
> > So vhost PMD reports meaningful link status and also generates
> > link status events.
> 
> Yes PMD process user callbacks on link change [1], but I am not sure that is
> what meant from "link status event", what I understand is link interrupts
> supported in PMD level which seems not the case for vhost.
> 
> [1]
> This is something else but why calling user callback in link update is in PMD
> discretion, shouldn't it be something done automatically in ethdev layer, 
> somehow.
> 
> > 
> > Thanks
> > 
> 


[dpdk-dev] whether DPDK support FreeBSD Guest OS VM on Xen hypervisor.

2018-04-16 Thread Zhongliang Shu
Hi, Guys:

I have found the DPDK Xen Guideline and there have example and steps

About how DPDK supporting Linux Guest OS VM on Xen Hypervisor.

But I cannot find similar document  about FreeBSD Guest OS VM on Xen Hypervisor.

For example,

1): there has xen_dom0.ko under librte_eal/linuxapp, but not have under

librte_eal/bsdapp.
2):  it uses Linux head file under examples/vhost_xen.3):  drivers/net/xenvirt 
source uses Linux Head files.
 

Is the DPDK17.05.2 supporting FreeBSD Guest OS VM on
Xen Hypervisor?  If yes, could you forward me the links of such document?
Thanks.


Re: [dpdk-dev] [PATCH v2] net/enic: add primary mac address handler

2018-04-16 Thread Hyong Youb Kim
On Mon, Apr 16, 2018 at 11:40:17AM +0200, David Marchand wrote:
> Modified enic_del_mac_address() to get a return value from the vnic layer.
> Reused the .mac_addr_add and .mac_addr_del callbacks code to implement
> primary mac address handler.
> 
> Signed-off-by: David Marchand 
> ---

Thanks.

The patch looks good to me. I've tested it on top of dpdk-net-next. It
works as expected.

Acked-by: Hyong Youb Kim 

> 
> Changes since v1:
> - rebased on dpdk-next-net following mac_addr_set rework,
> - since enicpmd_remove_mac_addr() does not return an error code, I chose to
>   expose the return value from enic_del_mac_address() so that an error
>   can be detected in the mac_addr_set callback. The log message in
>   enicpmd_remove_mac_addr() has been preserved.
> 
> ---
>  drivers/net/enic/enic.h|  2 +-
>  drivers/net/enic/enic_ethdev.c | 20 +++-
>  drivers/net/enic/enic_main.c   |  5 ++---
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
> index 751ddc7..5f15e44 100644
> --- a/drivers/net/enic/enic.h
> +++ b/drivers/net/enic/enic.h
> @@ -284,7 +284,7 @@ int enic_dev_stats_get(struct enic *enic,
>  void enic_dev_stats_clear(struct enic *enic);
>  void enic_add_packet_filter(struct enic *enic);
>  int enic_set_mac_address(struct enic *enic, uint8_t *mac_addr);
> -void enic_del_mac_address(struct enic *enic, int mac_index);
> +int enic_del_mac_address(struct enic *enic, int mac_index);
>  unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq);
>  void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
>  struct rte_mbuf *tx_pkt, unsigned short len,
> diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
> index 801f470..f503398 100644
> --- a/drivers/net/enic/enic_ethdev.c
> +++ b/drivers/net/enic/enic_ethdev.c
> @@ -583,7 +583,24 @@ static void enicpmd_remove_mac_addr(struct rte_eth_dev 
> *eth_dev, uint32_t index)
>   return;
>  
>   ENICPMD_FUNC_TRACE();
> - enic_del_mac_address(enic, index);
> + if (enic_del_mac_address(enic, index))
> + dev_err(enic, "del mac addr failed\n");
> +}
> +
> +static int enicpmd_set_mac_addr(struct rte_eth_dev *eth_dev,
> + struct ether_addr *addr)
> +{
> + struct enic *enic = pmd_priv(eth_dev);
> + int ret;
> +
> + if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> + return -E_RTE_SECONDARY;
> +
> + ENICPMD_FUNC_TRACE();
> + ret = enic_del_mac_address(enic, 0);
> + if (ret)
> + return ret;
> + return enic_set_mac_address(enic, addr->addr_bytes);
>  }
>  
>  static int enicpmd_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu)
> @@ -799,6 +816,7 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = {
>   .priority_flow_ctrl_set = NULL,
>   .mac_addr_add = enicpmd_add_mac_addr,
>   .mac_addr_remove  = enicpmd_remove_mac_addr,
> + .mac_addr_set = enicpmd_set_mac_addr,
>   .filter_ctrl  = enicpmd_dev_filter_ctrl,
>   .reta_query   = enicpmd_dev_rss_reta_query,
>   .reta_update  = enicpmd_dev_rss_reta_update,
> diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
> index 98d4775..d9bc7fd 100644
> --- a/drivers/net/enic/enic_main.c
> +++ b/drivers/net/enic/enic_main.c
> @@ -162,13 +162,12 @@ int enic_dev_stats_get(struct enic *enic, struct 
> rte_eth_stats *r_stats)
>   return 0;
>  }
>  
> -void enic_del_mac_address(struct enic *enic, int mac_index)
> +int enic_del_mac_address(struct enic *enic, int mac_index)
>  {
>   struct rte_eth_dev *eth_dev = enic->rte_dev;
>   uint8_t *mac_addr = eth_dev->data->mac_addrs[mac_index].addr_bytes;
>  
> - if (vnic_dev_del_addr(enic->vdev, mac_addr))
> - dev_err(enic, "del mac addr failed\n");
> + return vnic_dev_del_addr(enic->vdev, mac_addr);
>  }
>  
>  int enic_set_mac_address(struct enic *enic, uint8_t *mac_addr)
> -- 
> 2.7.4
> 


Re: [dpdk-dev] [PATCH v4 3/4] app/testpmd: add more GRE extension to csum engine

2018-04-16 Thread Xueming(Steven) Li


> -Original Message-
> From: Thomas Monjalon 
> Sent: Tuesday, April 17, 2018 6:45 AM
> To: Xueming(Steven) Li 
> Cc: dev@dpdk.org; Wenzhuo Lu ; Jingjing Wu 
> ; Yongseok Koh
> ; Olivier MATZ ; Shahaf Shuler 
> ;
> Ferruh Yigit 
> Subject: Re: [dpdk-dev] [PATCH v4 3/4] app/testpmd: add more GRE extension to 
> csum engine
> 
> 08/04/2018 14:32, Xueming Li:
> > This patch adds GRE checksum and sequence extension supports in
> > addtion to key extension to csum forwarding engine.
> >
> > Signed-off-by: Xueming Li 
> 
> This patch is also part of another series, isn't it?
> ("introduce new tunnel types")
> 

Good catch, it was there for test purpose, I'll remove this one and the next.


Re: [dpdk-dev] [PATCH v8 0/5] add ifcvf vdpa driver

2018-04-16 Thread Wang, Xiao W
Thanks for the reminder. Will fix it.

BRs,
Xiao

> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, April 17, 2018 2:07 AM
> To: Wang, Xiao W 
> Cc: Yigit, Ferruh ; Burakov, Anatoly
> ; dev@dpdk.org; maxime.coque...@redhat.com;
> Wang, Zhihong ; Bie, Tiwei ;
> Tan, Jianfeng ; Liang, Cunming
> ; Daly, Dan 
> Subject: Re: [PATCH v8 0/5] add ifcvf vdpa driver
> 
> 16/04/2018 18:36, Ferruh Yigit:
> > Hi Xiao,
> >
> > Getting following build error for 32bit [1], can you please check them?
> >
> > [1]
> > .../dpdk/drivers/net/ifc/ifcvf_vdpa.c: In function ‘ifcvf_dma_map’:
> > .../dpdk/drivers/net/ifc/ifcvf_vdpa.c:24:3: error: format ‘%lx’ expects
> argument
> > of type ‘long unsigned int’, but argument 6 has type ‘uint64_t {aka long 
> > long
> > unsigned int}’ [-Werror=format=]
> 
> Reminder from this recent post:
>   http://dpdk.org/ml/archives/dev/2018-February/090882.html
> "
> Most of the times, using %l is wrong (except when printing a long).
> So next time you write %l, please think "I am probably wrong".
> "
> 
> 



[dpdk-dev] [PATCH] net/nfp: fix possible resource leak

2018-04-16 Thread Yangchao Zhou
Signed-off-by: Yangchao Zhou 
---
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c 
b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index ad6ce72..f2fcc4a 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -816,6 +816,7 @@ struct nfp6000_area_priv {
 
if (fscanf(fp, "0x%lx 0x%lx 0x%lx", &start, &end, &flags) == 0) {
printf("error reading resource file for bar size\n");
+   fclose(fp);
return -1;
}
 
-- 
1.7.1



Re: [dpdk-dev] [PATCH v5 01/19] crypto/ccp: add AMD ccp skeleton PMD

2018-04-16 Thread Kumar, Ravi1
>Hi Ravi,
>
>> -Original Message-
>> From: Kumar, Ravi1 [mailto:ravi1.ku...@amd.com]
>> Sent: Monday, April 2, 2018 6:50 AM
>> To: De Lara Guarch, Pablo ; 
>> dev@dpdk.org
>> Cc: hemant.agra...@nxp.com
>> Subject: RE: [PATCH v5 01/19] crypto/ccp: add AMD ccp skeleton PMD
>> 
>> >
>> >
>> >> -Original Message-
>> >> From: Ravi Kumar [mailto:ravi1.ku...@amd.com]
>> >> Sent: Monday, March 19, 2018 12:24 PM
>> >> To: dev@dpdk.org
>> >> Cc: De Lara Guarch, Pablo ; 
>> >> hemant.agra...@nxp.com
>> >> Subject: [PATCH v5 01/19] crypto/ccp: add AMD ccp skeleton PMD
>> >>
>> >> Signed-off-by: Ravi Kumar 
>> >
>> >Patchset applied to dpdk-next-crypto, with same minor changes (title 
>> >changes
>> and driver registering modification due to an earlier patch).
>> >
>> >Thanks for the work!
>> >Pablo
>> >
>
>There's been a memory rework applied in DPDK at the same time I applied your 
>PMD in next-crypto, Which means that it is broken now. Could you submit a fix 
>for it?
>At least, compilation is broken now, but it may require more changes:
>
>drivers/crypto/ccp/ccp_dev.c:98:7: error: implicit declaration of function 
>'rte_eal_get_physmem_layout' is invalid in C99
>  [-Werror,-Wimplicit-function-declaration]
>ms = rte_eal_get_physmem_layout();
>
>This function does not exist anymore.
>Commit 2d84772bf858 ("crypto/qat: use contiguous allocation for DMA memory") 
>makes a similar required changed in QAT.
>Take a look at it and see if it suits you.
>

Hi Pablo,

Sure, we will send updated patch for this.

Regards,
Ravi

>> 
>> Thanks a lot Pablo.
>> 
>> Regards,
>> Ravi
>


[dpdk-dev] [disscussion] A problem about dpdk backup-mode bond switching with mlx4 VF devices

2018-04-16 Thread chenchanghu

Hi,
 When I used the mlx4 pmd, I meet a problem about mlx4 VF bond 
switching which bond mod is backup-mode . The detail test is descripted below.
1.Test environmemt infomation:
  a. Linux distribution: CentOS
  b. dpdk version: dpdk-16.04
  c. Ethernet device : mlx4 VF
  d. pmd info: mlx4 poll-mode-driver

2.Test step:
  a. we bond the mlx4 VF Ethernet device eth7,eth8 into backup-mode by dpdk 
application. Eth7 and eth8 are both active, and  eth7 is the primary device.
  b. As we know,  the device eth7 , eth8 are also  visible by kernel driver 
mlx4_en.
  c. Then we config the Ethernet device eth7 down by the command ' ifconfig 
eth7 down', the expect result is the bond primary device will  not switch.
  d. However we find the dpdk bond primary device switch to eth8 by dpdk 
maintenance interface one time in all 20 test times.

3.Question:
   Is the VF up or down State of kernel interface has some relations to 
user-space state? For example, when ifconfig eth7 down, and the user-space will 
change to down state too.

   Please send your reply to 
chenchan...@huawei.com, any suggestion is to be 
greatefully appreciated.




Re: [dpdk-dev] [PATCH v2 07/15] net/mlx5: support tunnel RSS level

2018-04-16 Thread Nélio Laranjeiro
On Sat, Apr 14, 2018 at 12:25:12PM +, Xueming(Steven) Li wrote:
>[...]
> > > @@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >   if (ret)
> > >   goto exit_free;
> > >   }
> > > - if (parser->mark)
> > > - mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > - if (parser->count && parser->create) {
> > > - mlx5_flow_create_count(dev, parser);
> > > - if (!parser->cs)
> > > - goto exit_count_error;
> > > - }
> > >   /*
> > >* Last step. Complete missing specification to reach the RSS
> > >* configuration.
> > >*/
> > >   if (!parser->drop)
> > > - ret = mlx5_flow_convert_rss(parser);
> > > + ret = mlx5_flow_convert_rss(dev, parser);
> > >   if (ret)
> > >   goto exit_free;
> > >   mlx5_flow_convert_finalise(parser);
> > >   mlx5_flow_update_priority(dev, parser, attr);
> > > + if (parser->mark)
> > > + mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > + if (parser->count && parser->create) {
> > > + mlx5_flow_create_count(dev, parser);
> > > + if (!parser->cs)
> > > + goto exit_count_error;
> > > + }
> > 
> > Why do you need to move this code?
> 
> To avoid counter resource missing if anything wrong in function 
> mlx5_flow_convert_rss().

Why this modification is addressed in this patch, why should it it be in
the patch introducing the mlx5_flow_convert_rss()?

>[...]
> > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
> > >   *   Number of queues.
> > >   * @param tunnel
> > >   *   Tunnel type.
> > > + * @param rss_level
> > > + *   RSS hash on tunnel level.
> > >   *
> > >   * @return
> > >   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
> > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > rte_eth_dev *dev,
> > > const uint8_t *rss_key, uint32_t rss_key_len,
> > > uint64_t hash_fields,
> > > -   const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> > > +   const uint16_t *queues, uint32_t queues_n,
> > > +   uint32_t tunnel, uint32_t rss_level)
> > 
> > tunnel and rss_level seems to be redundant here.
> > 
> > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> 
> There is a case of tunnel and outer rss(1).

Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
benefit of creating a tunnel hash Rx queue to make the same job as a
legacy one?

See below,

> > >  {
> > >   struct priv *priv = dev->data->dev_private;
> > >   struct mlx5_hrxq *hrxq;
> > >   struct mlx5_ind_table_ibv *ind_tbl;
> > >   struct ibv_qp *qp;
> > >   int err;
> > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > + struct mlx5dv_qp_init_attr qp_init_attr = {0}; #endif
> > >
> > >   queues_n = hash_fields ? queues_n : 1;
> > >   ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n); @@ -1410,6
> > > +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> > >   rte_errno = ENOMEM;
> > >   return NULL;
> > >   }
> > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > + if (tunnel) {

Why not: if (rss_level > 1) ?

> > > + qp_init_attr.comp_mask =
> > > + MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
> > > + qp_init_attr.create_flags = MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
> > > + }
> > > + qp = mlx5_glue->dv_create_qp(
> > > + priv->ctx,
> > > + &(struct ibv_qp_init_attr_ex){
> > > + .qp_type = IBV_QPT_RAW_PACKET,
> > > + .comp_mask =
> > > + IBV_QP_INIT_ATTR_PD |
> > > + IBV_QP_INIT_ATTR_IND_TABLE |
> > > + IBV_QP_INIT_ATTR_RX_HASH,
> > > + .rx_hash_conf = (struct ibv_rx_hash_conf){
> > > + .rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
> > > + .rx_hash_key_len = rss_key_len,
> > > + .rx_hash_key = (void *)(uintptr_t)rss_key,
> > > + .rx_hash_fields_mask = hash_fields |
> > > + (tunnel && rss_level ?
> > > + (uint32_t)IBV_RX_HASH_INNER : 0),
>[...]

 .rx_hash_fields_mask = hash_fields |
 (rss_level > 1) ?
 (uint32_t)IBV_RX_HASH_INNER : 0),

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v5 2/5] vhost: support selective datapath

2018-04-16 Thread Maxime Coquelin



On 04/15/2018 07:39 PM, Thomas Monjalon wrote:

03/04/2018 10:02, Maxime Coquelin:

On 04/02/2018 01:46 PM, Zhihong Wang wrote:

   lib/librte_vhost/Makefile  |   4 +-
   lib/librte_vhost/rte_vdpa.h|  87 +
   lib/librte_vhost/rte_vhost_version.map |   7 ++
   lib/librte_vhost/vdpa.c| 115 
+


With the fix you suggested:
Reviewed-by: Maxime Coquelin 


This patch is not OK. It is updating the Makefile but not meson.build.
I am fixing/amending it in master:

--- a/lib/librte_vhost/meson.build
+++ b/lib/librte_vhost/meson.build
@@ -9,7 +9,8 @@ if has_libnuma == 1
  endif
  version = 4
  allow_experimental_apis = true
-sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vhost.c', 'vhost_user.c',
+sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vdpa.c',
+   'vhost.c', 'vhost_user.c',
 'virtio_net.c', 'vhost_crypto.c')
-headers = files('rte_vhost.h', 'rte_vhost_crypto.h')
+headers = files('rte_vhost.h', 'rte_vdpa.h', 'rte_vhost_crypto.h')





Right, thanks Thomas.

Sorry for not catching this, I'm in the process of improving my build
scripts, but as you can see it is not yet ready...

Cheers,
Maxime


Re: [dpdk-dev] [PATCH v3 04/14] net/mlx5: support Rx tunnel type identification

2018-04-16 Thread Nélio Laranjeiro
On Sat, Apr 14, 2018 at 12:57:58PM +, Xueming(Steven) Li wrote:
> +Adrien
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Friday, April 13, 2018 9:03 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org; Olivier Matz
> > 
> > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > identification
> > 
> > +Olivier,
> > 
> > On Fri, Apr 13, 2018 at 07:20:13PM +0800, Xueming Li wrote:
> > > This patch introduced tunnel type identification based on flow rules.
> > > If flows of multiple tunnel types built on same queue,
> > > RTE_PTYPE_TUNNEL_MASK will be returned, user application could use
> > > bits in flow mark as tunnel type identifier.
> > 
> > For an application it will mean the packet embed all tunnel types defined
> > in DPDK, to make such thing you need a RTE_PTYPE_TUNNEL_UNKNOWN which does
> > not exists currently.
> 
> There was a RTE_PTYPE_TUNNEL_UNKNOWN definition, but removed due to 
> discussion.
> So I think it good to add it in the patchset of reviewed by Adrien.

Agreed,

> 
> > Even with it, the application still needs to parse the packet to discover
> > which tunnel the packet embed, is there any benefit having such bit?  Not
> > so sure.
> 
> With a tunnel flag, checksum status represent inner checksum.

Not sure this is generic enough, MLX5 behaves as this, but how behaves
other NICs?  It should have specific bits for inner checksum if all NIC
don't have the same behavior.

> Setting flow mark for different flow type could save time of parsing tunnel.

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v7 7/8] examples/vhost_crypto: add vhost crypto sample application

2018-04-16 Thread Maxime Coquelin



On 04/15/2018 07:35 PM, Thomas Monjalon wrote:

15/04/2018 16:34, Thomas Monjalon:

Hi,

05/04/2018 18:01, Fan Zhang:

This patch adds vhost_crypto sample application to DPDK.

Signed-off-by: Fan Zhang 
---
  examples/vhost_crypto/Makefile|  32 +++
  examples/vhost_crypto/main.c  | 541 ++
  examples/vhost_crypto/meson.build |  14 +
  3 files changed, 587 insertions(+)


There are 3 misses:

- not in examples/Makefile
- not in MAINTAINERS
- no documentation in doc/guides/sample_app_ug/

It won't be accepted in master as-is.


The doc is (curiously) in the next patch. I will move it here.
I will also fix the Makefile and the MAINTAINERS file:

ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_crypto
endif

Vhost-user
M: Maxime Coquelin 
M: Jianfeng Tan 
T: git://dpdk.org/next/dpdk-next-virtio
F: lib/librte_vhost/
[...]
F: examples/vhost_crypto/


Ferruh, Maxime, I hope you agree with this last-minute fix.




Agreed. thanks.

Maxime


Re: [dpdk-dev] [PATCH] ethdev: remove new to old offloads API helpers

2018-04-16 Thread Thomas Monjalon
16/04/2018 08:02, Shahaf Shuler:
> According to
> 
> commit 315ee8374e0e ("doc: reduce initial offload API rework scope
>to drivers")
> 
> All PMDs should have moved to the new offloads API. Therefore it is safe
> to remove the new->old convert helps.
> 
> The old->new helpers will remain to support application which still use
> the old API.
> 
> Signed-off-by: Shahaf Shuler 
> ---
>  lib/librte_ether/rte_ethdev.c | 100 
> +-
>  1 file changed, 2 insertions(+), 98 deletions(-)

I think you missed removing the deprecation notice part.





Re: [dpdk-dev] [PATCH v2 07/15] net/mlx5: support tunnel RSS level

2018-04-16 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Monday, April 16, 2018 3:14 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> 
> On Sat, Apr 14, 2018 at 12:25:12PM +, Xueming(Steven) Li wrote:
> >[...]
> > > > @@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > > > if (ret)
> > > > goto exit_free;
> > > > }
> > > > -   if (parser->mark)
> > > > -   mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > > -   if (parser->count && parser->create) {
> > > > -   mlx5_flow_create_count(dev, parser);
> > > > -   if (!parser->cs)
> > > > -   goto exit_count_error;
> > > > -   }
> > > > /*
> > > >  * Last step. Complete missing specification to reach the RSS
> > > >  * configuration.
> > > >  */
> > > > if (!parser->drop)
> > > > -   ret = mlx5_flow_convert_rss(parser);
> > > > +   ret = mlx5_flow_convert_rss(dev, parser);
> > > > if (ret)
> > > > goto exit_free;
> > > > mlx5_flow_convert_finalise(parser);
> > > > mlx5_flow_update_priority(dev, parser, attr);
> > > > +   if (parser->mark)
> > > > +   mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > > +   if (parser->count && parser->create) {
> > > > +   mlx5_flow_create_count(dev, parser);
> > > > +   if (!parser->cs)
> > > > +   goto exit_count_error;
> > > > +   }
> > >
> > > Why do you need to move this code?
> >
> > To avoid counter resource missing if anything wrong in function
> > mlx5_flow_convert_rss().
> 
> Why this modification is addressed in this patch, why should it it be in
> the patch introducing the mlx5_flow_convert_rss()?

Good catch, I'll update.
> 
> >[...]
> > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev
> *dev)
> > > >   *   Number of queues.
> > > >   * @param tunnel
> > > >   *   Tunnel type.
> > > > + * @param rss_level
> > > > + *   RSS hash on tunnel level.
> > > >   *
> > > >   * @return
> > > >   *   The Verbs object initialised, NULL otherwise and rte_errno is
> set.
> > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > > rte_eth_dev *dev,
> > > >   const uint8_t *rss_key, uint32_t rss_key_len,
> > > >   uint64_t hash_fields,
> > > > - const uint16_t *queues, uint32_t queues_n, uint32_t
> tunnel)
> > > > + const uint16_t *queues, uint32_t queues_n,
> > > > + uint32_t tunnel, uint32_t rss_level)
> > >
> > > tunnel and rss_level seems to be redundant here.
> > >
> > > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> >
> > There is a case of tunnel and outer rss(1).
> 
> Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
> benefit of creating a tunnel hash Rx queue to make the same job as a
> legacy one?

Tunnel checksum, ptype and rss offloading demand a QP to be created by DV api 
with
tunnel offload flags.

> 
> See below,
> 
> > > >  {
> > > > struct priv *priv = dev->data->dev_private;
> > > > struct mlx5_hrxq *hrxq;
> > > > struct mlx5_ind_table_ibv *ind_tbl;
> > > > struct ibv_qp *qp;
> > > > int err;
> > > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > > +   struct mlx5dv_qp_init_attr qp_init_attr = {0}; #endif
> > > >
> > > > queues_n = hash_fields ? queues_n : 1;
> > > > ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n); @@
> > > > -1410,6
> > > > +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > > rte_errno = ENOMEM;
> > > > return NULL;
> > > > }
> > > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > > +   if (tunnel) {
> 
> Why not: if (rss_level > 1) ?

Besides rss, ptype and checksum has to take advantage of tunnel offloading.

> 
> > > > +   qp_init_attr.comp_mask =
> > > > +   
> > > > MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
> > > > +   qp_init_attr.create_flags =
> MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
> > > > +   }
> > > > +   qp = mlx5_glue->dv_create_qp(
> > > > +   priv->ctx,
> > > > +   &(struct ibv_qp_init_attr_ex){
> > > > +   .qp_type = IBV_QPT_RAW_PACKET,
> > > > +   .comp_mask =
> > > > +   IBV_QP_INIT_ATTR_PD |
> > > > +   IBV_QP_INIT_ATTR_IND_TABLE |
> > > > +   IBV_QP_INIT_ATTR_RX_HASH,
> > > > +   .rx_hash_conf = (struct ibv_rx_hash_conf){
> > > > +   .rx_hash_function =
> IBV_RX_HASH_FUNC_TOEPLITZ,
> > > > +   .rx_hash_key_l

Re: [dpdk-dev] [PATCH v3 04/14] net/mlx5: support Rx tunnel type identification

2018-04-16 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Monday, April 16, 2018 3:29 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org; Olivier Matz
> ; Adrien Mazarguil 
> Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> identification
> 
> On Sat, Apr 14, 2018 at 12:57:58PM +, Xueming(Steven) Li wrote:
> > +Adrien
> >
> > > -Original Message-
> > > From: Nélio Laranjeiro 
> > > Sent: Friday, April 13, 2018 9:03 PM
> > > To: Xueming(Steven) Li 
> > > Cc: Shahaf Shuler ; dev@dpdk.org; Olivier Matz
> > > 
> > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > > identification
> > >
> > > +Olivier,
> > >
> > > On Fri, Apr 13, 2018 at 07:20:13PM +0800, Xueming Li wrote:
> > > > This patch introduced tunnel type identification based on flow rules.
> > > > If flows of multiple tunnel types built on same queue,
> > > > RTE_PTYPE_TUNNEL_MASK will be returned, user application could use
> > > > bits in flow mark as tunnel type identifier.
> > >
> > > For an application it will mean the packet embed all tunnel types
> > > defined in DPDK, to make such thing you need a
> > > RTE_PTYPE_TUNNEL_UNKNOWN which does not exists currently.
> >
> > There was a RTE_PTYPE_TUNNEL_UNKNOWN definition, but removed due to
> discussion.
> > So I think it good to add it in the patchset of reviewed by Adrien.
> 
> Agreed,
> 
> >
> > > Even with it, the application still needs to parse the packet to
> > > discover which tunnel the packet embed, is there any benefit having
> > > such bit?  Not so sure.
> >
> > With a tunnel flag, checksum status represent inner checksum.
> 
> Not sure this is generic enough, MLX5 behaves as this, but how behaves
> other NICs?  It should have specific bits for inner checksum if all NIC
> don't have the same behavior.

From my understanding, if outer checksum invalid, the packet can't be received 
as a tunneled packet, but a normal packet, thus checksum flags always result 
of inner for a valid tunneled packet.

> 
> > Setting flow mark for different flow type could save time of parsing
> tunnel.
> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND


Re: [dpdk-dev] [PATCH] net/bonding: fix link properties with autoneg

2018-04-16 Thread Matan Azrad
Hi Chas

From: Chas Williams, Wednesday, February 14, 2018 12:55 AM
> If a link is carrier down and using autonegotiation, then the PMD may not
> have detected a speed yet.  In this case the best we can do is ignore the link
> speed and duplex since they aren't valid.

Ok for this.

>  To be completely correct, there
> should be additional checks to prevent a slave that negotiates a different
> speed from being activated.

Looks like every changing in the link properties should cause LSC interrupt.
In the bonding LCS interrupt you could handle and to deactivate the device.
Also you should deal with the case of the first slave, what is happen if the 
first slave has invalid link properties?
How can you know that the speed\duplex_mode is invalid?
Are we sure LACP mode can run with auto negotiation?
  

> 
> Signed-off-by: Chas Williams 
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 92ad688..5559879 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1545,9 +1545,10 @@ link_properties_valid(struct rte_eth_dev
> *ethdev,
>   if (bond_ctx->mode == BONDING_MODE_8023AD) {
>   struct rte_eth_link *bond_link = &bond_ctx-
> >mode4.slave_link;
> 
> - if (bond_link->link_duplex != slave_link->link_duplex ||
> - bond_link->link_autoneg != slave_link->link_autoneg
> ||
> - bond_link->link_speed != slave_link->link_speed)
> + if (bond_link->link_autoneg != slave_link->link_autoneg ||
> + (bond_link->link_autoneg != ETH_LINK_AUTONEG &&
> +  (bond_link->link_duplex != slave_link->link_duplex ||
> +   bond_link->link_speed != slave_link->link_speed)))
>   return -1;
>   }
> 
> --
> 2.9.5



Re: [dpdk-dev] [PATCH v2 07/15] net/mlx5: support tunnel RSS level

2018-04-16 Thread Nélio Laranjeiro
On Mon, Apr 16, 2018 at 07:46:08AM +, Xueming(Steven) Li wrote:
>[...]
> > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev
> > *dev)
> > > > >   *   Number of queues.
> > > > >   * @param tunnel
> > > > >   *   Tunnel type.
> > > > > + * @param rss_level
> > > > > + *   RSS hash on tunnel level.
> > > > >   *
> > > > >   * @return
> > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno is
> > set.
> > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > > > rte_eth_dev *dev,
> > > > > const uint8_t *rss_key, uint32_t rss_key_len,
> > > > > uint64_t hash_fields,
> > > > > -   const uint16_t *queues, uint32_t queues_n, uint32_t
> > tunnel)
> > > > > +   const uint16_t *queues, uint32_t queues_n,
> > > > > +   uint32_t tunnel, uint32_t rss_level)
> > > >
> > > > tunnel and rss_level seems to be redundant here.
> > > >
> > > > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> > >
> > > There is a case of tunnel and outer rss(1).
> > 
> > Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
> > benefit of creating a tunnel hash Rx queue to make the same job as a
> > legacy one?
> 
> Tunnel checksum, ptype and rss offloading demand a QP to be created by DV api 
> with
> tunnel offload flags.

I was expecting such answer, such information should be present in the
function documentation, can you add it?

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v3 3/4] ethdev: add TTL change actions in flow API

2018-04-16 Thread Adrien Mazarguil
Hi Shahaf,

On Mon, Apr 16, 2018 at 05:48:19AM +, Shahaf Shuler wrote:
> Hi Qi,
> 
> Am wondering if we can make the below more generic and not tailored for 
> specific use cases. 

Regarding this, please see my previous answer [1] where I asked Qi to make
his changes more focused on the use case at hand when it became clear all
this work was targeting OpenFlow.

The OF specification [2] defines the behavior associated with each action,
for instance when a TTL is 0 or decrementing it would yield 0, the packet
must be dropped. Translating this to a generic decrement action for any
packet field is not so easy and not convenient.

Therefore my opinion is that if OF actions as defined by this specification
are supported as hardware capabilities, it makes sense to define dedicated
rte_flow actions for each of them (although "OF" should be part of their
name for clarity).

I'll comment the patch proper in a separate message.

[1] http://dpdk.org/ml/archives/dev/2018-April/096857.html
[2] 
https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-specifications/openflow/openflow-spec-v1.3.0.pdf

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v3 11/14] net/mlx5: support MPLS-in-GRE and MPLS-in-UDP

2018-04-16 Thread Nélio Laranjeiro
On Fri, Apr 13, 2018 at 03:22:50PM +, Xueming(Steven) Li wrote:
>[...] 
> > @@
> > > > static
> > > > > const struct mlx5_flow_items mlx5_flow_items[] = {
> > > > >   .convert = mlx5_flow_create_vxlan_gpe,
> > > > >   .dst_sz = sizeof(struct ibv_flow_spec_tunnel),
> > > > >   },
> > > > > + [RTE_FLOW_ITEM_TYPE_MPLS] = {
> > > > > + .items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
> > > > > +RTE_FLOW_ITEM_TYPE_IPV4,
> > > > > +RTE_FLOW_ITEM_TYPE_IPV6),
> > > > > + .actions = valid_actions,
> > > > > + .mask = &(const struct rte_flow_item_mpls){
> > > > > + .label_tc_s = "\xff\xff\xf0",
> > > > > + },
> > > > > + .default_mask = &rte_flow_item_mpls_mask,
> > > > > + .mask_sz = sizeof(struct rte_flow_item_mpls),
> > > > > + .convert = mlx5_flow_create_mpls, #ifdef
> > > > > +HAVE_IBV_DEVICE_MPLS_SUPPORT
> > > > > + .dst_sz = sizeof(struct ibv_flow_spec_mpls), #endif
> > > > > + },
> > > >
> > > > Why the whole item is not under ifdef?
> > >
> > > If apply macro to whole item, there will be a null pointer if create
> > mpls flow.
> > > There is a macro in function mlx5_flow_create_mpls() to avoid using this
> > invalid data.
> > 
> > I think there is some kind of confusion here, what I mean is moving the
> > #ifdef to embrace the whole stuff i.e.:
> > 
> >  #ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
> >  [RTE_FLOW_ITEM_TYPE_MPLS] = {
> >   .items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
> >RTE_FLOW_ITEM_TYPE_IPV4,
> >RTE_FLOW_ITEM_TYPE_IPV6),
> >   .actions = valid_actions,
> >   .mask = &(const struct rte_flow_item_mpls){
> > .label_tc_s = "\xff\xff\xf0",
> >   },
> >   .default_mask = &rte_flow_item_mpls_mask,
> >   .mask_sz = sizeof(struct rte_flow_item_mpls),
> >   .convert = mlx5_flow_create_mpls,
> >   .dst_sz = sizeof(struct ibv_flow_spec_mpls)  #endif
> > 
> > Not having this item in this static array ends by not supporting it, this
> > is what I mean.
> 
> Yes, I know. There is a code using this array w/o NULL check:
>   cur_item = &mlx5_flow_items[items->type];
>   ret = cur_item->convert(items,
>   (cur_item->default_mask ?
>cur_item->default_mask :
>cur_item->mask),
>&data);
> 
> 

This code is after the mlx5_flow_convert_items_validate() which refuses
unknown items, if you you see an unknown item reaching this code above,
there is bug somewhere and it should be fixed.  Un-supported items
should not be in the static array.  

Regards,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] DEV_RX_OFFLOAD_SCATTER for ixgbe and igb devices

2018-04-16 Thread Stokes, Ian
> On 4/13/2018 3:52 PM, Stokes, Ian wrote:
> > Hi all,
> >
> > Currently it's the case that for some NICs (e.g. igb driver or ixgbe
> driver based), scatter_rx needs to be enabled explicitly in the case where
> it was not configured before.
> >
> > A patch submitted for ovs-dpdk proposes to check that the
> DEV_RX_OFFLOAD_SCATTER flag is present in rte_ethdev_info.rx_offload_capa
> before setting scatter_rx.
> >
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/345901.html
> >
> > While testing igb and ixgbe devices I spotted the DEV_RX_OFFLOAD_SCATTER
> flag is not set in rx_offload_capa.
> >
> > As these devices require scatter_rx, should the scatter_rx flag be set
> for them as part of the eth_igb_infos_get() for igb and ixgbe_dev_info_get
> for ixgbe?
> 
> I agree, since that offload is supported by PMD, should be in capability
> flag.
> Would you mind sending a patch for it, we can continue to discuss on it?
> 
> Thanks,
> Ferruh

Thanks for the confirmation Ferruh, will send a patch.

Thanks
Ian


Re: [dpdk-dev] kernel binding of devices + hotplug

2018-04-16 Thread Bruce Richardson
On Sat, Apr 14, 2018 at 08:10:28PM +, Matan Azrad wrote:
> Hi all
> 
> From: Burakov, Anatoly, Friday, April 13, 2018 8:41 PM
> > To: Bruce Richardson ; Thomas Monjalon
> > 
> > Cc: dev@dpdk.org; pmati...@redhat.com; david.march...@6wind.com;
> > jia@intel.com; Matan Azrad ;
> > konstantin.anan...@intel.com; step...@networkplumber.org;
> > f...@redhat.com
> > Subject: Re: kernel binding of devices + hotplug
> > 
> > On 13-Apr-18 5:40 PM, Bruce Richardson wrote:
> > > On Fri, Apr 13, 2018 at 06:31:21PM +0200, Thomas Monjalon wrote:
> > >> It's time to think (again) how we bind devices with kernel modules.
> > >> We need to decide how we want to manage hotplugged devices with
> > DPDK.
> > >>
> > >> A bit of history first.
> > >> There was some code in DPDK for bind/unbind, but it has been removed
> > >> in DPDK 1.7 -
> > >>
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> > >>
> > k.org%2Fcommit%2F5d8751b83&data=02%7C01%7Cmatan%40mellanox.com
> > %7C6ea5
> > >>
> > 5ce994ff4bb0d65208d5a165b417%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> > C0%7
> > >>
> > C0%7C636592380565078675&sdata=uLRDAk65hYtJYxjIvY20de377yayCN5DrjCZ
> > x8H
> > >> p61o%3D&reserved=0 Copy of the commit message (in 2014):
> > >> "
> > >>  The bind/unbind operations should not be handled by the eal.
> > >>  These operations should be either done outside of dpdk or
> > >>  inside the PMDs themselves as these are their problems.
> > >> "
> > >>
> > >> The question raised at this time (4 years ago) is still under discussion.
> > >> Should we manage binding inside or outside DPDK?
> > >> Should it be controlled in the application or in the OS base?
> > >>
> > >> As you know, we use dpdk-devbind.py.
> > >> This tool lacks two major features:
> > >>  - persistent configuration
> > >>  - hotplug
> > >>
> > >> If we consider that the DPDK applications should be able to apply its
> > >> own policy to choose the devices to bind, then we need to implement
> > >> binding in the PMD (with EAL helpers).
> > >>
> > >> On the other hand, if we consider that it is the system
> > >> responsibility, then we could choose systemd/udev and driverctl.
> > >>
> > >> The debate is launched!
> > >>
> > >
> > > Allow me to nail my colours to the mast early! :-)
> > >
> > > I believe it's system not application responsibility.
> > > I also believe I have previously explained my reasons for that choice
> > > in some of the previous email threads.
> > 
> > For what it's worth, I tend to agree, if only because writing code for what 
> > is
> > essentially a bunch of read/write/filesystem enumeration in C is extremely
> > fiddly and error prone :) IMO things like this are better handled either by
> > scripts, or by tools whose sole purpose is doing exactly that (or both).
> > 
> > I like having scripts like devbind in DPDK because we can tailor them to our
> > use cases better, and having them is amenable to automation, but while I
> > wouldn't be opposed to removing them altogether in favor of some external
> > tool (systemd/udev/driverctl/whatever), in my humble opinion moving them
> > back into EAL or even PMD's would be a mistake.
> > 
> 
> Since the application runs in the system by a command of the system user I 
> think the responsibility is for the user.
> The DPDK user forwards the control of some devices to the DPDK application 
> using the EAL whitelist\blacklist mode to specify the devices,
> Any DPDK PMD should know which binding it needs to probe\control the device 
> and can apply it,
> So, if the user asks to control on a device by DPDK application it makes 
> sense that the application will do the correct binding to the device since 
> the user wants to use it(no need to ask more operation of pre binding from 
> the user).

Completely agree that it is ultimately up to the user. However, what I
don't want to see is the case where the user always has to specify a big
long list of device whitelist and blacklist options to each run of an
application. Instead, if device management is done at the system level via
udev (for example) configured via devicectl, then the application
commandline can be vastly simplified. It also allows better usability
across systems, since the same commandline can be used on multiple systems
with different hardware, with the actual device management rules having
been already configured at system install/setup time in udev.

> 
> Regarding the conflict of system rules for a device, it is again the user 
> responsibility, whatever we will decide for the binding procedure of DPDK 
> application the user needs to take it into account and to solve such like 
> conflicts.
> One option is to remove any binding rules of a DPDK device in the DPDK 
> application initialization and adjust the new rules by the PMDs, then any 
> conflict should not disturb the user.

If the device management is only managed in one place, i.e. not in DPDK,
then there is no conflict to manage.

> 
> In current hot-plug case the appli

[dpdk-dev] [PATCH] net/ixgbe: fix segfault in configuring VF VLAN strip

2018-04-16 Thread Wei Dai
This patch fixes a segment fault in ixgbevf_vlan_offload_set( )
when a Rx queue with index < max_rx_queues is not setup.
For such queue, rxq = dev->data->rx_queues[i] is null pointer.

Fixes: 860a94d3c692 ("net/ixgbe: support VLAN strip per queue offloading in VF")
Cc: sta...@dpdk.org

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a5e2fc0..33ee52e 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -5184,15 +5184,13 @@ ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev, 
uint16_t queue, int on)
 static int
 ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask)
 {
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_rx_queue *rxq;
uint16_t i;
int on = 0;
 
/* VF function only support hw strip feature, others are not support */
if (mask & ETH_VLAN_STRIP_MASK) {
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];
on = !!(rxq->offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
ixgbevf_vlan_strip_queue_set(dev, i, on);
-- 
2.7.5



Re: [dpdk-dev] [PATCH] net/ixgbe: fix segfault in configuring VF VLAN strip

2018-04-16 Thread Lin, Xueqin
Tested-by:  Xueqin Lin

1. Branch: master
2. NIC: Niantic
Steps:
1. Pull branch code to newest and apply the patch
2. Create one VF port using kernel PF 
echo 1 > /sys/bus/pci/devices/\:81\:00.0/sriov_numvfs
3. Bind VF to igb_uio
4. Start testpmd, find VF testpmd could startup successfully.
./x86_64-native-linuxapp-gcc/app/testpmd -c 0x6 -n 4  -- -i

Result: This patch could fix VF can't setup successfully in Niantic NIC.

Best regards,
Xueqin

-Original Message-
From: Dai, Wei 
Sent: Monday, April 16, 2018 4:14 PM
To: Lu, Wenzhuo ; Ananyev, Konstantin 
; Zhang, Qi Z ; Lin, Xueqin 

Cc: dev@dpdk.org; Dai, Wei ; sta...@dpdk.org
Subject: [PATCH] net/ixgbe: fix segfault in configuring VF VLAN strip

This patch fixes a segment fault in ixgbevf_vlan_offload_set( ) when a Rx queue 
with index < max_rx_queues is not setup.
For such queue, rxq = dev->data->rx_queues[i] is null pointer.

Fixes: 860a94d3c692 ("net/ixgbe: support VLAN strip per queue offloading in VF")
Cc: sta...@dpdk.org

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a5e2fc0..33ee52e 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -5184,15 +5184,13 @@ ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev, 
uint16_t queue, int on)  static int  ixgbevf_vlan_offload_set(struct 
rte_eth_dev *dev, int mask)  {
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_rx_queue *rxq;
uint16_t i;
int on = 0;
 
/* VF function only support hw strip feature, others are not support */
if (mask & ETH_VLAN_STRIP_MASK) {
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];
on = !!(rxq->offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
ixgbevf_vlan_strip_queue_set(dev, i, on);
--
2.7.5



Re: [dpdk-dev] [PATCH v3 3/4] ethdev: add TTL change actions in flow API

2018-04-16 Thread Shahaf Shuler
Monday, April 16, 2018 11:12 AM, Adrien Mazarguil:
> Subject: Re: [dpdk-dev] [PATCH v3 3/4] ethdev: add TTL change actions in
> flow API
> 
> Hi Shahaf,
> 
> On Mon, Apr 16, 2018 at 05:48:19AM +, Shahaf Shuler wrote:
> > Hi Qi,
> >
> > Am wondering if we can make the below more generic and not tailored for
> specific use cases.
> 
> Regarding this, please see my previous answer [1] where I asked Qi to make
> his changes more focused on the use case at hand when it became clear all
> this work was targeting OpenFlow.

OK,
I missed that. Sorry for jumping in late.

> 
> The OF specification [2] defines the behavior associated with each action, for
> instance when a TTL is 0 or decrementing it would yield 0, the packet must be
> dropped. Translating this to a generic decrement action for any packet field 
> is
> not so easy and not convenient.

I am not sure I understand why. It is to set -1 in the TTL field of the generic 
action. 
We can define the corner cases more carefully as part of the actions. For 
example - no wrap around. 
I did not understood the drop if TTL is 0 is part of the action (it is not 
described the action description[1]).
Is this the case? 

I think it is wrong approach to introduce a "combo" actions (both decrements 
and drops if value) in rte_flow. 
I would model such  operation by a set of (pseudo code)
1. ACTION_FIELD_DEC_INC , ACTION_GO_TO_GROUP
2. (in next group) matching on the TTL , ACTION_DROP 

> 
> Therefore my opinion is that if OF actions as defined by this specification 
> are
> supported as hardware capabilities, it makes sense to define dedicated
> rte_flow actions for each of them (although "OF" should be part of their
> name for clarity).

I still think we may need in the future to support copy/increment/decrement of 
fields not specifically related to OF. 
It is better to have APIs which will not change or have double meaning. 


[1]
[1]
+Action: ``IP_TTL_DEC``
+^^
+
+Decrement IPv4 TTL or IPv6 hop limit field and update the IP checksum, 
+only applies to packets that contain specific MPLS headers.
+
+.. _table_rte_flow_action_ip_ttl_dec:
+
+.. table:: IP_TTL_DEC


> 
> I'll comment the patch proper in a separate message.
> 
> [1]
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> k.org%2Fml%2Farchives%2Fdev%2F2018-
> April%2F096857.html&data=02%7C01%7Cshahafs%40mellanox.com%7C6d2b
> 747ae47841bc55e508d5a371d2f4%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> C0%7C0%7C636594631626247567&sdata=3oTbKT6QwS1WiAIrkF885dEU76ep4
> xreuHoHiwDA2Ec%3D&reserved=0
> [2]
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> ww.opennetworking.org%2Fimages%2Fstories%2Fdownloads%2Fsdn-
> resources%2Fonf-specifications%2Fopenflow%2Fopenflow-spec-
> v1.3.0.pdf&data=02%7C01%7Cshahafs%40mellanox.com%7C6d2b747ae4784
> 1bc55e508d5a371d2f4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1%
> 7C636594631626247567&sdata=e6uelVwIu1poE2uIvEJELuIzela8H%2B8HclQE5
> EdKEaM%3D&reserved=0
> 
> --
> Adrien Mazarguil
> 6WIND


Re: [dpdk-dev] [PATCH] net/ixgbe: fix segfault in configuring VF VLAN strip

2018-04-16 Thread Lin, Xueqin
> -Original Message-
> From: Dai, Wei
> Sent: Monday, April 16, 2018 4:14 PM
> To: Lu, Wenzhuo ; Ananyev, Konstantin
> ; Zhang, Qi Z ; Lin,
> Xueqin 
> Cc: dev@dpdk.org; Dai, Wei ; sta...@dpdk.org
> Subject: [PATCH] net/ixgbe: fix segfault in configuring VF VLAN strip
> 
> This patch fixes a segment fault in ixgbevf_vlan_offload_set( ) when a Rx
> queue with index < max_rx_queues is not setup.
> For such queue, rxq = dev->data->rx_queues[i] is null pointer.
> 
> Fixes: 860a94d3c692 ("net/ixgbe: support VLAN strip per queue offloading in
> VF")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Wei Dai 
Tested-by: Xueqin Lin 
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index a5e2fc0..33ee52e 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -5184,15 +5184,13 @@ ixgbevf_vlan_strip_queue_set(struct rte_eth_dev
> *dev, uint16_t queue, int on)  static int  ixgbevf_vlan_offload_set(struct
> rte_eth_dev *dev, int mask)  {
> - struct ixgbe_hw *hw =
> - IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>   struct ixgbe_rx_queue *rxq;
>   uint16_t i;
>   int on = 0;
> 
>   /* VF function only support hw strip feature, others are not support */
>   if (mask & ETH_VLAN_STRIP_MASK) {
> - for (i = 0; i < hw->mac.max_rx_queues; i++) {
> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
>   rxq = dev->data->rx_queues[i];
>   on = !!(rxq->offloads &
>   DEV_RX_OFFLOAD_VLAN_STRIP);
>   ixgbevf_vlan_strip_queue_set(dev, i, on);
> --
> 2.7.5



Re: [dpdk-dev] [PATCH] examples/l2fwd-crypto: fix the default aead assignments

2018-04-16 Thread Akhil Goyal

On 4/11/2018 2:45 PM, Hemant Agrawal wrote:

The code is incorrectly updating the authxform instead of
aead xforms.

Fixes: b79e4c00af0e ("cryptodev: use AES-GCM/CCM as AEAD algorithms")
Cc: sta...@dpdk.org

Signed-off-by: Hemant Agrawal 
---
 examples/l2fwd-crypto/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/l2fwd-crypto/main.c b/examples/l2fwd-crypto/main.c
index 4d8341e..38e0c7e 100644
--- a/examples/l2fwd-crypto/main.c
+++ b/examples/l2fwd-crypto/main.c
@@ -1474,8 +1474,8 @@ l2fwd_crypto_default_options(struct l2fwd_crypto_options 
*options)
options->aead_iv_random_size = -1;
options->aead_iv.length = 0;

-   options->auth_xform.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
-   options->auth_xform.aead.op = RTE_CRYPTO_AEAD_OP_ENCRYPT;
+   options->aead_xform.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
+   options->aead_xform.aead.op = RTE_CRYPTO_AEAD_OP_ENCRYPT;

options->aad_param = 0;
options->aad_random_size = -1;


Acked-by: Akhil Goyal 


Re: [dpdk-dev] [PATCH 1/2] crypto/dpaa_sec: improve the error checking

2018-04-16 Thread Akhil Goyal

On 4/5/2018 2:05 PM, Hemant Agrawal wrote:

From: Sunil Kumar Kori 

Reported by NXP's internal coverity

Signed-off-by: Sunil Kumar Kori 
---
 drivers/crypto/dpaa_sec/dpaa_sec.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)


Acked-by: Akhil Goyal 



Re: [dpdk-dev] [PATCH 2/2] crypto/dpaa2_sec: improve error handling

2018-04-16 Thread Akhil Goyal

On 4/5/2018 2:05 PM, Hemant Agrawal wrote:

From: Sunil Kumar Kori 

Fixed as reported by NXP's internal coverity.
Also part of dpdk coverity.

Coverity issue: 268331
Coverity issue: 268333

Signed-off-by: Sunil Kumar Kori 
---

Acked-by: Akhil Goyal 



Re: [dpdk-dev] [v3,3/3] doc: add private data info in crypto guide

2018-04-16 Thread Akhil Goyal

Hi Abhinandan,

On 4/16/2018 12:24 PM, Abhinandan Gujjar wrote:

Signed-off-by: Abhinandan Gujjar 
Acked-by: Akhil Goyal 
I think I acked this complete series. And this patch is v2 not v3. You 
should also mention the changelog.


Thanks,
Akhil


---
 doc/guides/prog_guide/cryptodev_lib.rst | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index 066fe2d..b279a20 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -299,6 +299,33 @@ directly from the devices processed queue, and for virtual 
device's from a
 enqueue call.


+Private data
+
+For session-based operations, the set and get API provides a mechanism for an
+application to store and retrieve the private data information stored along 
with
+the crypto session.
+
+For example, suppose an application is submitting a crypto operation with a 
session
+associated and wants to indicate private data information which is required to 
be
+used after completion of the crypto operation. In this case, the application 
can use
+the set API to set the private data and retrieve it using get API.
+
+.. code-block:: c
+
+   int rte_cryptodev_sym_session_set_private_data(
+   struct rte_cryptodev_sym_session *sess, void *data, uint16_t 
size);
+
+   void * rte_cryptodev_sym_session_get_private_data(
+   struct rte_cryptodev_sym_session *sess);
+
+
+For session-less mode, the private data information can be placed along with 
the
+``struct rte_crypto_op``. The ``rte_crypto_op::private_data_offset`` indicates 
the
+start of private data information. The offset is counted from the start of the
+rte_crypto_op including other crypto information such as the IVs (since there 
can
+be an IV also for authentication).
+
+
 Enqueue / Dequeue Burst APIs
 






Re: [dpdk-dev] kernel binding of devices + hotplug

2018-04-16 Thread Guo, Jia

hi, all


On 4/15/2018 4:10 AM, Matan Azrad wrote:

Hi all

From: Burakov, Anatoly, Friday, April 13, 2018 8:41 PM

To: Bruce Richardson ; Thomas Monjalon

Cc: dev@dpdk.org; pmati...@redhat.com; david.march...@6wind.com;
jia@intel.com; Matan Azrad ;
konstantin.anan...@intel.com; step...@networkplumber.org;
f...@redhat.com
Subject: Re: kernel binding of devices + hotplug

On 13-Apr-18 5:40 PM, Bruce Richardson wrote:

On Fri, Apr 13, 2018 at 06:31:21PM +0200, Thomas Monjalon wrote:

It's time to think (again) how we bind devices with kernel modules.
We need to decide how we want to manage hotplugged devices with

DPDK.

A bit of history first.
There was some code in DPDK for bind/unbind, but it has been removed
in DPDK 1.7 -


https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
k.org%2Fcommit%2F5d8751b83&data=02%7C01%7Cmatan%40mellanox.com
%7C6ea5
5ce994ff4bb0d65208d5a165b417%7Ca652971c7d2e4d9ba6a4d149256f461b%7
C0%7
C0%7C636592380565078675&sdata=uLRDAk65hYtJYxjIvY20de377yayCN5DrjCZ
x8H

p61o%3D&reserved=0 Copy of the commit message (in 2014):
"
The bind/unbind operations should not be handled by the eal.
These operations should be either done outside of dpdk or
inside the PMDs themselves as these are their problems.
"

The question raised at this time (4 years ago) is still under discussion.
Should we manage binding inside or outside DPDK?
Should it be controlled in the application or in the OS base?

As you know, we use dpdk-devbind.py.
This tool lacks two major features:
- persistent configuration
- hotplug

If we consider that the DPDK applications should be able to apply its
own policy to choose the devices to bind, then we need to implement
binding in the PMD (with EAL helpers).

On the other hand, if we consider that it is the system
responsibility, then we could choose systemd/udev and driverctl.

The debate is launched!


Allow me to nail my colours to the mast early! :-)

I believe it's system not application responsibility.
I also believe I have previously explained my reasons for that choice
in some of the previous email threads.

For what it's worth, I tend to agree, if only because writing code for what is
essentially a bunch of read/write/filesystem enumeration in C is extremely
fiddly and error prone :) IMO things like this are better handled either by
scripts, or by tools whose sole purpose is doing exactly that (or both).

I like having scripts like devbind in DPDK because we can tailor them to our
use cases better, and having them is amenable to automation, but while I
wouldn't be opposed to removing them altogether in favor of some external
tool (systemd/udev/driverctl/whatever), in my humble opinion moving them
back into EAL or even PMD's would be a mistake.


Since the application runs in the system by a command of the system user I 
think the responsibility is for the user.
The DPDK user forwards the control of some devices to the DPDK application 
using the EAL whitelist\blacklist mode to specify the devices,
Any DPDK PMD should know which binding it needs to probe\control the device and 
can apply it,
So, if the user asks to control on a device by DPDK application it makes sense 
that the application will do the correct binding to the device since the user 
wants to use it(no need to ask more operation of pre binding from the user).

Regarding the conflict of system rules for a device, it is again the user 
responsibility, whatever we will decide for the binding procedure of DPDK 
application the user needs to take it into account and to solve such like 
conflicts.
One option is to remove any binding rules of a DPDK device in the DPDK 
application initialization and adjust the new rules by the PMDs, then any 
conflict should not disturb the user.

In current hot-plug case the application will need to do a lot of work to 
bind\remap devices in plug-in\plug-out events while the PMD could have all the 
knowledge to do it.

One more issue with the script is that the user should do different bind per 
device, in case of PMD responsibility the user can forget it:
Think about that, any time the user wants to switch\add new supported nic it 
should update the script usage and to do per nic operation contrary to the DPDK 
principles.

Matan.

Thanks,
Anatoly
when device appear whenever dpdk is runnning or not, the device will 
default bind to Kernel driver, user or say system admin could use the 
script or tools to rebind a specific driver which according their request
so i think user space tools provide functional and user have the binding 
responsibility rather than the app or PMD. i don't understand why over 
ride to other driver is the scope of an specific PMD. And if there is 
conflict by rules , user could over ride it and take the control.


Per dpdk hotplug, the purpose is for the app failsafe and VM live 
migration,  driverctl is focus driver control and udev is focus on 
device hotplug but no userspace failure handle , 

Re: [dpdk-dev] [PATCH v3 04/14] net/mlx5: support Rx tunnel type identification

2018-04-16 Thread Adrien Mazarguil
On Mon, Apr 16, 2018 at 08:05:13AM +, Xueming(Steven) Li wrote:
> 
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Monday, April 16, 2018 3:29 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org; Olivier Matz
> > ; Adrien Mazarguil 
> > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > identification
> > 
> > On Sat, Apr 14, 2018 at 12:57:58PM +, Xueming(Steven) Li wrote:
> > > +Adrien
> > >
> > > > -Original Message-
> > > > From: Nélio Laranjeiro 
> > > > Sent: Friday, April 13, 2018 9:03 PM
> > > > To: Xueming(Steven) Li 
> > > > Cc: Shahaf Shuler ; dev@dpdk.org; Olivier Matz
> > > > 
> > > > Subject: Re: [PATCH v3 04/14] net/mlx5: support Rx tunnel type
> > > > identification
> > > >
> > > > +Olivier,
> > > >
> > > > On Fri, Apr 13, 2018 at 07:20:13PM +0800, Xueming Li wrote:
> > > > > This patch introduced tunnel type identification based on flow rules.
> > > > > If flows of multiple tunnel types built on same queue,
> > > > > RTE_PTYPE_TUNNEL_MASK will be returned, user application could use
> > > > > bits in flow mark as tunnel type identifier.
> > > >
> > > > For an application it will mean the packet embed all tunnel types
> > > > defined in DPDK, to make such thing you need a
> > > > RTE_PTYPE_TUNNEL_UNKNOWN which does not exists currently.
> > >
> > > There was a RTE_PTYPE_TUNNEL_UNKNOWN definition, but removed due to
> > discussion.
> > > So I think it good to add it in the patchset of reviewed by Adrien.
> > 
> > Agreed,
> > 
> > >
> > > > Even with it, the application still needs to parse the packet to
> > > > discover which tunnel the packet embed, is there any benefit having
> > > > such bit?  Not so sure.
> > >
> > > With a tunnel flag, checksum status represent inner checksum.
> > 
> > Not sure this is generic enough, MLX5 behaves as this, but how behaves
> > other NICs?  It should have specific bits for inner checksum if all NIC
> > don't have the same behavior.
> 
> From my understanding, if outer checksum invalid, the packet can't be 
> received 
> as a tunneled packet, but a normal packet, thus checksum flags always result 
> of inner for a valid tunneled packet.

Yes, since checksum validation information covers all layers at once
(outermost to the innermost recognized), the presence of an "unknown tunnel"
bit implicitly means outer headers are OK.

Now regarding the addition of RTE_PTYPE_TUNNEL_UNKNOWN, the main issue I see
is that it's implicit, as in getting 0 after and'ing packet types with
RTE_PTYPE_TUNNEL_MASK means either not present or unknown type.

How about not setting any tunnel bit and let applications rely on the
presence of RTE_PTYPE_INNER_* to determine that there is a tunnel of unknown
type? The rationale being that a tunneled packet without an inner payload is
kind of pointless anyway.

> > > Setting flow mark for different flow type could save time of parsing
> > tunnel.
> > 
> > Thanks,
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [v3,3/3] doc: add private data info in crypto guide

2018-04-16 Thread Gujjar, Abhinandan S
Hi Akhil,

Missed out the *series* and thought the ack was only on the doc patch.
Regarding change log, Just before posting I looked at recent patches.
Most of them, didn't have change log! So, I didn't add it.
Yes, there is a typo in the subject of this patch. Does this require newer 
version of patches?

Regards
Abhinandan

> -Original Message-
> From: Akhil Goyal [mailto:akhil.go...@nxp.com]
> Sent: Monday, April 16, 2018 2:47 PM
> To: Gujjar, Abhinandan S ; De Lara Guarch, Pablo
> ; Doherty, Declan
> ; jerin.ja...@caviumnetworks.com;
> hemant.agra...@nxp.com; dev@dpdk.org
> Cc: Vangati, Narender ; Rao, Nikhil
> 
> Subject: Re: [v3,3/3] doc: add private data info in crypto guide
> 
> Hi Abhinandan,
> 
> On 4/16/2018 12:24 PM, Abhinandan Gujjar wrote:
> > Signed-off-by: Abhinandan Gujjar 
> > Acked-by: Akhil Goyal 
> I think I acked this complete series. And this patch is v2 not v3. You should 
> also
> mention the changelog.
> 
> Thanks,
> Akhil
> 
> > ---
> >  doc/guides/prog_guide/cryptodev_lib.rst | 27
> > +++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/cryptodev_lib.rst
> > b/doc/guides/prog_guide/cryptodev_lib.rst
> > index 066fe2d..b279a20 100644
> > --- a/doc/guides/prog_guide/cryptodev_lib.rst
> > +++ b/doc/guides/prog_guide/cryptodev_lib.rst
> > @@ -299,6 +299,33 @@ directly from the devices processed queue, and
> > for virtual device's from a  enqueue call.
> >
> >
> > +Private data
> > +
> > +For session-based operations, the set and get API provides a
> > +mechanism for an application to store and retrieve the private data
> > +information stored along with the crypto session.
> > +
> > +For example, suppose an application is submitting a crypto operation
> > +with a session associated and wants to indicate private data
> > +information which is required to be used after completion of the
> > +crypto operation. In this case, the application can use the set API to set 
> > the
> private data and retrieve it using get API.
> > +
> > +.. code-block:: c
> > +
> > +   int rte_cryptodev_sym_session_set_private_data(
> > +   struct rte_cryptodev_sym_session *sess, void *data, uint16_t
> size);
> > +
> > +   void * rte_cryptodev_sym_session_get_private_data(
> > +   struct rte_cryptodev_sym_session *sess);
> > +
> > +
> > +For session-less mode, the private data information can be placed
> > +along with the ``struct rte_crypto_op``. The
> > +``rte_crypto_op::private_data_offset`` indicates the start of private
> > +data information. The offset is counted from the start of the
> > +rte_crypto_op including other crypto information such as the IVs (since 
> > there
> can be an IV also for authentication).
> > +
> > +
> >  Enqueue / Dequeue Burst APIs
> >  
> >
> >



[dpdk-dev] [PATCH v2] net/enic: add primary mac address handler

2018-04-16 Thread David Marchand
Modified enic_del_mac_address() to get a return value from the vnic layer.
Reused the .mac_addr_add and .mac_addr_del callbacks code to implement
primary mac address handler.

Signed-off-by: David Marchand 
---

Changes since v1:
- rebased on dpdk-next-net following mac_addr_set rework,
- since enicpmd_remove_mac_addr() does not return an error code, I chose to
  expose the return value from enic_del_mac_address() so that an error
  can be detected in the mac_addr_set callback. The log message in
  enicpmd_remove_mac_addr() has been preserved.

---
 drivers/net/enic/enic.h|  2 +-
 drivers/net/enic/enic_ethdev.c | 20 +++-
 drivers/net/enic/enic_main.c   |  5 ++---
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 751ddc7..5f15e44 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -284,7 +284,7 @@ int enic_dev_stats_get(struct enic *enic,
 void enic_dev_stats_clear(struct enic *enic);
 void enic_add_packet_filter(struct enic *enic);
 int enic_set_mac_address(struct enic *enic, uint8_t *mac_addr);
-void enic_del_mac_address(struct enic *enic, int mac_index);
+int enic_del_mac_address(struct enic *enic, int mac_index);
 unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq);
 void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
   struct rte_mbuf *tx_pkt, unsigned short len,
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 801f470..f503398 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -583,7 +583,24 @@ static void enicpmd_remove_mac_addr(struct rte_eth_dev 
*eth_dev, uint32_t index)
return;
 
ENICPMD_FUNC_TRACE();
-   enic_del_mac_address(enic, index);
+   if (enic_del_mac_address(enic, index))
+   dev_err(enic, "del mac addr failed\n");
+}
+
+static int enicpmd_set_mac_addr(struct rte_eth_dev *eth_dev,
+   struct ether_addr *addr)
+{
+   struct enic *enic = pmd_priv(eth_dev);
+   int ret;
+
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return -E_RTE_SECONDARY;
+
+   ENICPMD_FUNC_TRACE();
+   ret = enic_del_mac_address(enic, 0);
+   if (ret)
+   return ret;
+   return enic_set_mac_address(enic, addr->addr_bytes);
 }
 
 static int enicpmd_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu)
@@ -799,6 +816,7 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.priority_flow_ctrl_set = NULL,
.mac_addr_add = enicpmd_add_mac_addr,
.mac_addr_remove  = enicpmd_remove_mac_addr,
+   .mac_addr_set = enicpmd_set_mac_addr,
.filter_ctrl  = enicpmd_dev_filter_ctrl,
.reta_query   = enicpmd_dev_rss_reta_query,
.reta_update  = enicpmd_dev_rss_reta_update,
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 98d4775..d9bc7fd 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -162,13 +162,12 @@ int enic_dev_stats_get(struct enic *enic, struct 
rte_eth_stats *r_stats)
return 0;
 }
 
-void enic_del_mac_address(struct enic *enic, int mac_index)
+int enic_del_mac_address(struct enic *enic, int mac_index)
 {
struct rte_eth_dev *eth_dev = enic->rte_dev;
uint8_t *mac_addr = eth_dev->data->mac_addrs[mac_index].addr_bytes;
 
-   if (vnic_dev_del_addr(enic->vdev, mac_addr))
-   dev_err(enic, "del mac addr failed\n");
+   return vnic_dev_del_addr(enic->vdev, mac_addr);
 }
 
 int enic_set_mac_address(struct enic *enic, uint8_t *mac_addr)
-- 
2.7.4



Re: [dpdk-dev] [PATCH v3 3/4] ethdev: add TTL change actions in flow API

2018-04-16 Thread Adrien Mazarguil
On Mon, Apr 16, 2018 at 08:56:37AM +, Shahaf Shuler wrote:
> Monday, April 16, 2018 11:12 AM, Adrien Mazarguil:
> > Subject: Re: [dpdk-dev] [PATCH v3 3/4] ethdev: add TTL change actions in
> > flow API
> > 
> > Hi Shahaf,
> > 
> > On Mon, Apr 16, 2018 at 05:48:19AM +, Shahaf Shuler wrote:
> > > Hi Qi,
> > >
> > > Am wondering if we can make the below more generic and not tailored for
> > specific use cases.
> > 
> > Regarding this, please see my previous answer [1] where I asked Qi to make
> > his changes more focused on the use case at hand when it became clear all
> > this work was targeting OpenFlow.
> 
> OK,
> I missed that. Sorry for jumping in late.
> 
> > 
> > The OF specification [2] defines the behavior associated with each action, 
> > for
> > instance when a TTL is 0 or decrementing it would yield 0, the packet must 
> > be
> > dropped. Translating this to a generic decrement action for any packet 
> > field is
> > not so easy and not convenient.
> 
> I am not sure I understand why. It is to set -1 in the TTL field of the 
> generic action. 
> We can define the corner cases more carefully as part of the actions. For 
> example - no wrap around. 
> I did not understood the drop if TTL is 0 is part of the action (it is not 
> described the action description[1]).
> Is this the case? 

I still need to comment the original patch :)

Basically I would like to make all these actions point to the OpenFlow
action documentation describing them with a disclaimer such as "These are
OpenFlow actions, here's a summary of what they do, see linked OF
documentation for details".

> I think it is wrong approach to introduce a "combo" actions (both decrements 
> and drops if value) in rte_flow. 
> I would model such  operation by a set of (pseudo code)
> 1. ACTION_FIELD_DEC_INC , ACTION_GO_TO_GROUP
> 2. (in next group) matching on the TTL , ACTION_DROP 

If a device really implements something that does "check TTL on protocol $FOO,
decrement it, re-check TTL, update checksum, drop packet if any of the
previous steps failed", then by all means I think a dedicated action is
justified. It's also easier to document as "does what OpenFlow specifies"
and much more convenient to applications.

Another set of actions can be added for devices (or PMDs) with partial
support (e.g. to expose a "dumb" decrement capability as in the original
series).

The real question is are there devices that fully implement OF actions as
described by the linked spec? Can any missing bits be handled by PMDs
without a noticeable performance impact?

> > Therefore my opinion is that if OF actions as defined by this specification 
> > are
> > supported as hardware capabilities, it makes sense to define dedicated
> > rte_flow actions for each of them (although "OF" should be part of their
> > name for clarity).
> 
> I still think we may need in the future to support copy/increment/decrement 
> of fields not specifically related to OF. 
> It is better to have APIs which will not change or have double meaning. 

We could add them later on a needed basis. Correct me if I'm wrong, but
right now OF is the only use case everyone has in mind. Application
developers will always favor a set of explicit OF actions over more
convoluted means of achieving the expected behavior.

> [1]
> +Action: ``IP_TTL_DEC``
> +^^
> +
> +Decrement IPv4 TTL or IPv6 hop limit field and update the IP checksum, 
> +only applies to packets that contain specific MPLS headers.
> +
> +.. _table_rte_flow_action_ip_ttl_dec:
> +
> +.. table:: IP_TTL_DEC
> 
> 
> > 
> > I'll comment the patch proper in a separate message.
> > 
> > [1]
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> > k.org%2Fml%2Farchives%2Fdev%2F2018-
> > April%2F096857.html&data=02%7C01%7Cshahafs%40mellanox.com%7C6d2b
> > 747ae47841bc55e508d5a371d2f4%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> > C0%7C0%7C636594631626247567&sdata=3oTbKT6QwS1WiAIrkF885dEU76ep4
> > xreuHoHiwDA2Ec%3D&reserved=0
> > [2]
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> > ww.opennetworking.org%2Fimages%2Fstories%2Fdownloads%2Fsdn-
> > resources%2Fonf-specifications%2Fopenflow%2Fopenflow-spec-
> > v1.3.0.pdf&data=02%7C01%7Cshahafs%40mellanox.com%7C6d2b747ae4784
> > 1bc55e508d5a371d2f4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1%
> > 7C636594631626247567&sdata=e6uelVwIu1poE2uIvEJELuIzela8H%2B8HclQE5
> > EdKEaM%3D&reserved=0
> > 
> > --
> > Adrien Mazarguil
> > 6WIND

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [pull-request] next-eventdev 18.05 RC1

2018-04-16 Thread Thomas Monjalon
14/04/2018 09:25, Jerin Jacob:
>   http://dpdk.org/git/next/dpdk-next-eventdev 

Pulled, thanks




Re: [dpdk-dev] [PATCH v7 1/5] vfio: extend data structure for multi container

2018-04-16 Thread Burakov, Anatoly

On 15-Apr-18 4:33 PM, Xiao Wang wrote:

Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.

This patch extends the vfio_config structure to contain per-container
user_mem_maps and defines an array of vfio_config. The next patch will
base on this to add container API.

Signed-off-by: Junjie Chen 
Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
  config/common_base |   1 +
  lib/librte_eal/linuxapp/eal/eal_vfio.c | 407 ++---
  lib/librte_eal/linuxapp/eal/eal_vfio.h |  19 +-
  3 files changed, 275 insertions(+), 152 deletions(-)

diff --git a/config/common_base b/config/common_base
index c4236fd1f..4a76d2f14 100644
--- a/config/common_base
+++ b/config/common_base
@@ -87,6 +87,7 @@ CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
  CONFIG_RTE_EAL_IGB_UIO=n
  CONFIG_RTE_EAL_VFIO=n
  CONFIG_RTE_MAX_VFIO_GROUPS=64
+CONFIG_RTE_MAX_VFIO_CONTAINERS=64
  CONFIG_RTE_MALLOC_DEBUG=n
  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
  CONFIG_RTE_USE_LIBBSD=n
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 589d7d478..46fba2d8d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -22,8 +22,46 @@
  
  #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
  
+/*

+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+   int group_no;
+   int fd;
+   int devices;
+};


What is the purpose of moving this into .c file? Seems like an 
unnecessary change.



+
+/* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped. we can
+ * recreate the mappings for DPDK segments, but we cannot do so for memory that
+ * was registered by the user themselves, so we need to store the user mappings
+ * somewhere, to recreate them later.
+ */
+#define VFIO_MAX_USER_MEM_MAPS 256
+struct user_mem_map {
+   uint64_t addr;
+   uint64_t iova;
+   uint64_t len;
+};
+


<...>


+static struct vfio_config *
+get_vfio_cfg_by_group_no(int iommu_group_no)
+{
+   struct vfio_config *vfio_cfg;
+   int i, j;
+
+   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
+   vfio_cfg = &vfio_cfgs[i];
+   for (j = 0; j < VFIO_MAX_GROUPS; j++) {
+   if (vfio_cfg->vfio_groups[j].group_no ==
+   iommu_group_no)
+   return vfio_cfg;
+   }
+   }
+
+   return default_vfio_cfg;


Here and in other places: i'm not sure returning default vfio config if 
group not found is such a good idea. It would be better if calling code 
explicitly handled case of group not existing yet.



+}
+
+static struct vfio_config *
+get_vfio_cfg_by_group_fd(int vfio_group_fd)
+{
+   struct vfio_config *vfio_cfg;
+   int i, j;
+
+   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
+   vfio_cfg = &vfio_cfgs[i];
+   for (j = 0; j < VFIO_MAX_GROUPS; j++)
+   if (vfio_cfg->vfio_groups[j].fd == vfio_group_fd)
+   return vfio_cfg;
+   }
  


<...>


-   for (i = 0; i < VFIO_MAX_GROUPS; i++) {
-   vfio_cfg.vfio_groups[i].fd = -1;
-   vfio_cfg.vfio_groups[i].group_no = -1;
-   vfio_cfg.vfio_groups[i].devices = 0;
+   rte_spinlock_recursive_t lock = RTE_SPINLOCK_RECURSIVE_INITIALIZER;
+
+   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
+   vfio_cfgs[i].vfio_container_fd = -1;
+   vfio_cfgs[i].vfio_active_groups = 0;
+   vfio_cfgs[i].vfio_iommu_type = NULL;
+   vfio_cfgs[i].mem_maps.lock = lock;


Nitpick - why copy, instead of straight up initializing with 
RTE_SPINLOCK_RECURSIVE_INITIALIZER?



+
+   for (j = 0; j < VFIO_MAX_GROUPS; j++) {
+   vfio_cfgs[i].vfio_groups[j].fd = -1;
+   vfio_cfgs[i].vfio_groups[j].group_no = -1;
+   vfio_cfgs[i].vfio_groups[j].devices = 0;
+   }
}
  
  	/* inform the user that we are probing for VFIO */

@@ -841,12 +971,12 @@ rte_vfio_enable(const char *modname)
return 0;
}


<...>

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v7 2/5] vfio: add multi container support

2018-04-16 Thread Burakov, Anatoly

On 15-Apr-18 4:33 PM, Xiao Wang wrote:

This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.

A driver could use "rte_vfio_create_container" helper to create a


^^ wrong API name in commit message :)


new container from eal, use "rte_vfio_bind_group" to bind a device
to the newly created container. During rte_vfio_setup_device the
container bound with the device will be used for IOMMU setup.

Signed-off-by: Junjie Chen 
Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
  lib/librte_eal/bsdapp/eal/eal.c  |  52 +
  lib/librte_eal/common/include/rte_vfio.h | 119 
  lib/librte_eal/linuxapp/eal/eal_vfio.c   | 316 +++
  lib/librte_eal/rte_eal_version.map   |   6 +
  4 files changed, 493 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 727adc5d2..c5106d0d6 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -769,6 +769,14 @@ int rte_vfio_noiommu_is_enabled(void);
  int rte_vfio_clear_group(int vfio_group_fd);
  int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
  int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
+int rte_vfio_container_create(void);
+int rte_vfio_container_destroy(int container_fd);
+int rte_vfio_bind_group(int container_fd, int iommu_group_no);
+int rte_vfio_unbind_group(int container_fd, int iommu_group_no);


Maybe have these under "container" too? e.g. 
rte_vfio_container_group_bind/unbind? Seems like it would be more 
consistent that way - anything to do with custom containers would be 
under rte_vfio_container_* namespace.



+int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
+   uint64_t iova, uint64_t len);
+int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
+   uint64_t iova, uint64_t len);
  
  int rte_vfio_setup_device(__rte_unused const char *sysfs_base,

  __rte_unused const char *dev_addr,
@@ -818,3 +826,47 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t 
__rte_unused iova,
  {
return -1;
  }
+


<...>


diff --git a/lib/librte_eal/common/include/rte_vfio.h 
b/lib/librte_eal/common/include/rte_vfio.h
index d26ab01cb..0c1509b29 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -168,6 +168,125 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t 
len);
  int __rte_experimental
  rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
  
+/**

+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Create a new container for device binding.


I would add a note that any newly allocated DPDK memory will not be 
mapped into these containers by default.



+ *
+ * @return
+ *   the container fd if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_create(void);
+


<...>


+ *0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_unbind_group(int container_fd, int iommu_group_no);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Perform dma mapping for devices in a conainer.


Here and in other places: "dma" should be DMA, and typo: "conainer" :)

I think you should also add a note to the original API (not this one, 
but the old one) that DMA maps done via that API will only apply to 
default container and will not apply to any of the containers created 
via container_create(). IOW, documentation should make it clear that if 
you use this functionality, you're on your own and you have to manage 
your own DMA mappings for any containers you create.



+ *
+ * @param container_fd
+ *   the specified container fd
+ *
+ * @param vaddr
+ *   Starting virtual address of memory to be mapped.
+ *


<...>


+
+int __rte_experimental
+rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
+   uint64_t len)
+{
+   struct user_mem_map *new_map;
+   struct vfio_config *vfio_cfg;
+   struct user_mem_maps *user_mem_maps;
+   int ret = 0;
+
+   if (len == 0) {
+   rte_errno = EINVAL;
+   return -1;
+   }
+
+   vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+   if (vfio_cfg == NULL) {
+   RTE_LOG(ERR, EAL, "Invalid container fd\n");
+   return -1;
+   }
+
+   user_mem_maps = &vfio_cfg->mem_maps;
+   rte_spinlock_recursive_lock(&user_mem_maps->lock);
+   if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
+   RTE_LOG(ERR, EAL, "No more space for user mem maps\n");
+   rte_errno = ENOMEM;
+   ret = -1;
+   goto out;
+   }
+   /* map the entry */
+   if (vfio_dma_mem_map(vfio_cfg

[dpdk-dev] [PATCH] app/eventdev: fix typos in timer adapter options

2018-04-16 Thread Thomas Monjalon
The options names in code and doc are not the same.

Fixes: 98c6292105d4 ("app/eventdev: add options for event timer adapter")
Cc: pbhagavat...@caviumnetworks.com

Signed-off-by: Thomas Monjalon 
---
 app/test-eventdev/evt_options.c   |  2 +-
 doc/guides/tools/testeventdev.rst | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 5f311a570..701cd4e96 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -248,7 +248,7 @@ usage(char *program)
"\t burst mode.\n"
"\t--nb_timers: number of timers to arm.\n"
"\t--nb_timer_adptrs  : number of timer adapters to use.\n"
-   "\t--timer_tick_nsec  : timer tick interval in ns.\n"
+   "\t--timer_tick_ns: timer tick interval in ns.\n"
"\t--max_tmo_nsec : max timeout interval in ns.\n"
"\t--expiry_ns: event timer expiry ns.\n"
);
diff --git a/doc/guides/tools/testeventdev.rst 
b/doc/guides/tools/testeventdev.rst
index 46effd871..b03c4a17b 100644
--- a/doc/guides/tools/testeventdev.rst
+++ b/doc/guides/tools/testeventdev.rst
@@ -131,7 +131,7 @@ The following are the application command-line options:
 
 Use burst mode event timer adapter as producer.
 
- * ``--timer_tick_nsec``
+ * ``--timer_tick_ns``
 
 Used to dictate number of nano seconds between bucket traversal of the
 event timer adapter. Refer `rte_event_timer_adapter_conf`.
@@ -140,7 +140,7 @@ The following are the application command-line options:
 
 Used to configure event timer adapter max arm timeout in nano seconds.
 
- * ``--expiry_nsec``
+ * ``--expiry_ns``
 
 Dictate the number of nano seconds after which the event timer expires.
 
@@ -379,9 +379,9 @@ Supported application command line options are following::
 --prod_type_ethdev
 --prod_type_timerdev_burst
 --prod_type_timerdev
---timer_tick_nsec
+--timer_tick_ns
 --max_tmo_nsec
---expiry_nsec
+--expiry_ns
 --nb_timers
 --nb_timer_adptrs
 
@@ -478,9 +478,9 @@ Supported application command line options are following::
 --prod_type_ethdev
 --prod_type_timerdev_burst
 --prod_type_timerdev
---timer_tick_nsec
+--timer_tick_ns
 --max_tmo_nsec
---expiry_nsec
+--expiry_ns
 --nb_timers
 --nb_timer_adptrs
 
-- 
2.16.2



Re: [dpdk-dev] [PATCH v2 07/15] net/mlx5: support tunnel RSS level

2018-04-16 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Monday, April 16, 2018 4:09 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> 
> On Mon, Apr 16, 2018 at 07:46:08AM +, Xueming(Steven) Li wrote:
> >[...]
> > > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct
> > > > > > rte_eth_dev
> > > *dev)
> > > > > >   *   Number of queues.
> > > > > >   * @param tunnel
> > > > > >   *   Tunnel type.
> > > > > > + * @param rss_level
> > > > > > + *   RSS hash on tunnel level.
> > > > > >   *
> > > > > >   * @return
> > > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno
> is
> > > set.
> > > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *
> > > > > > mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > > > >   const uint8_t *rss_key, uint32_t rss_key_len,
> > > > > >   uint64_t hash_fields,
> > > > > > - const uint16_t *queues, uint32_t queues_n, uint32_t
> > > tunnel)
> > > > > > + const uint16_t *queues, uint32_t queues_n,
> > > > > > + uint32_t tunnel, uint32_t rss_level)
> > > > >
> > > > > tunnel and rss_level seems to be redundant here.
> > > > >
> > > > > rss_level > 1 is equivalent to tunnel, there is no need to have
> both.
> > > >
> > > > There is a case of tunnel and outer rss(1).
> > >
> > > Why cannot it be handled by a regular Hash Rx queue, i.e. what is
> > > the benefit of creating a tunnel hash Rx queue to make the same job
> > > as a legacy one?
> >
> > Tunnel checksum, ptype and rss offloading demand a QP to be created by
> > DV api with tunnel offload flags.
> 
> I was expecting such answer, such information should be present in the
> function documentation, can you add it?

You mean https://dpdk.org/doc/guides/nics/overview.html?
"Inner L3 checksum" and "Inner L4 checksum" defined. 
I added "Inner RSS" per your suggestion, The only thing missing is 
"Innner packet type", make sense?

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND


Re: [dpdk-dev] [PATCH] app/eventdev: fix typos in timer adapter options

2018-04-16 Thread Pavan Nikhilesh
On Mon, Apr 16, 2018 at 12:03:53PM +0200, Thomas Monjalon wrote:
> The options names in code and doc are not the same.
>
> Fixes: 98c6292105d4 ("app/eventdev: add options for event timer adapter")
> Cc: pbhagavat...@caviumnetworks.com
>
> Signed-off-by: Thomas Monjalon 
> ---
>  app/test-eventdev/evt_options.c   |  2 +-
>  doc/guides/tools/testeventdev.rst | 12 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
Acked-by: Pavan Nikhilesh 


Re: [dpdk-dev] [PATCH] eal: fix compilation without VFIO

2018-04-16 Thread Burakov, Anatoly

On 16-Apr-18 6:50 AM, Shahaf Shuler wrote:

Friday, April 13, 2018 4:59 PM, Thomas Monjalon:


OK. Shahaf, will you submit a v2 with this, or should i do it? I think
it should be just a matter of #ifndef VFIO_PRESENT //define
vfio_device_info struct #endif - this should take care of the problem
of hiding the function definitions.

FreeBSD will also need to be adjusted to remove dummy prototypes.


I think you are more familiar with VFIO than any of us.
It is better to let you do, think about the implications and do the tests.
Thanks :)



I don't mind whom of us will do it, as long as it will be done quickly. 
Currently there are some tests which cannot run on our regression due to it.




Yes, i'll be submitting the patch shortly.

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] app/eventdev: fix typos in timer adapter options

2018-04-16 Thread Pavan Nikhilesh
On Mon, Apr 16, 2018 at 12:03:53PM +0200, Thomas Monjalon wrote:
> The options names in code and doc are not the same.
>
> Fixes: 98c6292105d4 ("app/eventdev: add options for event timer adapter")
> Cc: pbhagavat...@caviumnetworks.com
>
> Signed-off-by: Thomas Monjalon 
> ---
>  app/test-eventdev/evt_options.c   |  2 +-
>  doc/guides/tools/testeventdev.rst | 12 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
> index 5f311a570..701cd4e96 100644
> --- a/app/test-eventdev/evt_options.c
> +++ b/app/test-eventdev/evt_options.c
> @@ -248,7 +248,7 @@ usage(char *program)
>   "\t burst mode.\n"
>   "\t--nb_timers: number of timers to arm.\n"
>   "\t--nb_timer_adptrs  : number of timer adapters to use.\n"
> - "\t--timer_tick_nsec  : timer tick interval in ns.\n"
> + "\t--timer_tick_ns: timer tick interval in ns.\n"
>   "\t--max_tmo_nsec : max timeout interval in ns.\n"
>   "\t--expiry_ns: event timer expiry ns.\n"
>   );

I think it would be better to maintain consistency across options, I will send
a patch to fix it.

Thanks,
Pavan


[dpdk-dev] [PATCH] drivers/net: remove duplicated includes

2018-04-16 Thread Thomas Monjalon
Duplicated includes are found with devtools/check-dup-includes.sh

Signed-off-by: Thomas Monjalon 
---
 drivers/net/axgbe/axgbe_common.h   | 1 -
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index 97a80f595..d25d54cac 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c 
b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index ad6ce72fe..d26e782bb 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.16.2



[dpdk-dev] [PATCH] examples/ip_pipeline: fix buffer not null terminated.

2018-04-16 Thread Fan Zhang
Coverity issue: 272563
Fixes: 8245472c58c8 ("examples/ip_pipeline: add sw queue object")

Signed-off-by: Fan Zhang 
---
 examples/ip_pipeline/swq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ip_pipeline/swq.c b/examples/ip_pipeline/swq.c
index c11bbf27e..be78704c1 100644
--- a/examples/ip_pipeline/swq.c
+++ b/examples/ip_pipeline/swq.c
@@ -64,7 +64,7 @@ swq_create(const char *name, struct swq_params *params)
}
 
/* Node fill in */
-   strncpy(swq->name, name, sizeof(swq->name));
+   strncpy(swq->name, name, sizeof(swq->name) - 1);
swq->r = r;
 
/* Node add to list */
-- 
2.13.6



[dpdk-dev] [PATCH] net/thunderx: fix MTU configuration for jumbo pkts

2018-04-16 Thread Nitin Saxena
thunderx pmd driver passes dev_info.max_rx_pktlen as
9200 (via rte_eth_dev_info_get()) to application.
But, when application tries to set MTU as
(9200 - sizeof(ethernet_header_t)) the operation fails
because of missing CRC and VLAN additions.

This patch fixes the following for thunderx pmd driver:
 - Sets NIC_HW_MAX_FRS to 9216 (instead of 9200)
 - Sets NIC_HW_MAX_MTU to 9190 (NIC_HW_MAX_FRS - ETH_HLEN
   - ETHER_CRC_LEN - 2*VLAN_HLEN)
 - Sets dev_info->max_rx_pkt_len to NIC_HW_MAX_MTU +
   ETH_HLEN (instead of 9200)
 - Allows rte_eth_dev_set_mtu() to pass if application
   (like VPP) calls rte_eth_dev_set_mtu() before
   rte_eth_dev_start() by putting appropriate check for
   dev->data->dev_started

Fixes: 65d9804edc05 ("net/thunderx: support MTU configuration")

Signed-off-by: Nitin Saxena 
---
 drivers/net/thunderx/base/nicvf_hw_defs.h |  5 -
 drivers/net/thunderx/nicvf_ethdev.c   | 15 +++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/thunderx/base/nicvf_hw_defs.h 
b/drivers/net/thunderx/base/nicvf_hw_defs.h
index b13c21f..b12c8ec 100644
--- a/drivers/net/thunderx/base/nicvf_hw_defs.h
+++ b/drivers/net/thunderx/base/nicvf_hw_defs.h
@@ -171,7 +171,10 @@
 
 /* Min/Max packet size */
 #define NIC_HW_MIN_FRS  (64)
-#define NIC_HW_MAX_FRS  (9200) /* 9216 max pkt including FCS */
+/* ETH_HLEN+ETH_FCS_LEN+2*VLAN_HLEN */
+#define NIC_HW_L2_OVERHEAD  (26)
+#define NIC_HW_MAX_MTU  (9190)
+#define NIC_HW_MAX_FRS  (NIC_HW_MAX_MTU + NIC_HW_L2_OVERHEAD)
 #define NIC_HW_MAX_SEGS (12)
 
 /* Descriptor alignments */
diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
b/drivers/net/thunderx/nicvf_ethdev.c
index 75e9d16..a7931af 100644
--- a/drivers/net/thunderx/nicvf_ethdev.c
+++ b/drivers/net/thunderx/nicvf_ethdev.c
@@ -162,7 +162,7 @@ static int
 nicvf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 {
struct nicvf *nic = nicvf_pmd_priv(dev);
-   uint32_t buffsz, frame_size = mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
+   uint32_t buffsz, frame_size = mtu + NIC_HW_L2_OVERHEAD;
size_t i;
struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 
@@ -180,7 +180,7 @@ nicvf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 * Refuse mtu that requires the support of scattered packets
 * when this feature has not been enabled before.
 */
-   if (!dev->data->scattered_rx &&
+   if (dev->data->dev_started && !dev->data->scattered_rx &&
(frame_size + 2 * VLAN_TAG_SIZE > buffsz))
return -EINVAL;
 
@@ -194,11 +194,11 @@ nicvf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
else
rxmode->offloads &= ~DEV_RX_OFFLOAD_JUMBO_FRAME;
 
-   if (nicvf_mbox_update_hw_max_frs(nic, frame_size))
+   if (nicvf_mbox_update_hw_max_frs(nic, mtu))
return -EINVAL;
 
-   /* Update max frame size */
-   rxmode->max_rx_pkt_len = (uint32_t)frame_size;
+   /* Update max_rx_pkt_len */
+   rxmode->max_rx_pkt_len = mtu + ETHER_HDR_LEN;
nic->mtu = mtu;
 
for (i = 0; i < nic->sqs_count; i++)
@@ -1408,7 +1408,7 @@ nicvf_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->speed_capa |= ETH_LINK_SPEED_40G;
 
dev_info->min_rx_bufsize = ETHER_MIN_MTU;
-   dev_info->max_rx_pktlen = NIC_HW_MAX_FRS;
+   dev_info->max_rx_pktlen = NIC_HW_MAX_MTU + ETHER_HDR_LEN;
dev_info->max_rx_queues =
(uint16_t)MAX_RCV_QUEUES_PER_QS * (MAX_SQS_PER_VF + 1);
dev_info->max_tx_queues =
@@ -1741,8 +1741,7 @@ nicvf_dev_start(struct rte_eth_dev *dev)
/* Setup MTU based on max_rx_pkt_len or default */
mtu = dev->data->dev_conf.rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME ?
dev->data->dev_conf.rxmode.max_rx_pkt_len
-   -  ETHER_HDR_LEN - ETHER_CRC_LEN
-   : ETHER_MTU;
+   -  ETHER_HDR_LEN : ETHER_MTU;
 
if (nicvf_dev_set_mtu(dev, mtu)) {
PMD_INIT_LOG(ERR, "Failed to set default mtu size");
-- 
2.7.4



[dpdk-dev] [PATCH] app/eventdev: fix typos in timer adapter options

2018-04-16 Thread Pavan Nikhilesh
The options names in code and doc are not the same.

Fixes: 98c6292105d4 ("app/eventdev: add options for event timer adapter")

Suggested-by: Thomas Monjalon 
Signed-off-by: Pavan Nikhilesh 
---
 app/test-eventdev/evt_options.c | 2 +-
 app/test-eventdev/evt_options.h | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 5f311a570..cfa43a165 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -250,7 +250,7 @@ usage(char *program)
"\t--nb_timer_adptrs  : number of timer adapters to use.\n"
"\t--timer_tick_nsec  : timer tick interval in ns.\n"
"\t--max_tmo_nsec : max timeout interval in ns.\n"
-   "\t--expiry_ns: event timer expiry ns.\n"
+   "\t--expiry_nsec: event timer expiry ns.\n"
);
printf("available tests:\n");
evt_test_dump_names();
diff --git a/app/test-eventdev/evt_options.h b/app/test-eventdev/evt_options.h
index c059f7084..1bc7ea0f6 100644
--- a/app/test-eventdev/evt_options.h
+++ b/app/test-eventdev/evt_options.h
@@ -36,9 +36,9 @@
 #define EVT_PROD_TIMERDEV_BURST  ("prod_type_timerdev_burst")
 #define EVT_NB_TIMERS("nb_timers")
 #define EVT_NB_TIMER_ADPTRS  ("nb_timer_adptrs")
-#define EVT_TIMER_TICK_NSEC  ("timer_tick_ns")
+#define EVT_TIMER_TICK_NSEC  ("timer_tick_nsec")
 #define EVT_MAX_TMO_NSEC ("max_tmo_nsec")
-#define EVT_EXPIRY_NSEC  ("expiry_ns")
+#define EVT_EXPIRY_NSEC  ("expiry_nsec")
 #define EVT_HELP ("help")
 
 enum evt_prod_type {
@@ -292,10 +292,10 @@ evt_dump_producer_type(struct evt_options *opt)
evt_dump("max_tmo_nsec", "%"PRIu64"", opt->max_tmo_nsec);
evt_dump("expiry_nsec", "%"PRIu64"", opt->expiry_nsec);
if (opt->optm_timer_tick_nsec)
-   evt_dump("optm_timer_tick_ns", "%"PRIu64"",
+   evt_dump("optm_timer_tick_nsec", "%"PRIu64"",
opt->optm_timer_tick_nsec);
else
-   evt_dump("timer_tick_ns", "%"PRIu64"",
+   evt_dump("timer_tick_nsec", "%"PRIu64"",
opt->timer_tick_nsec);
break;
}
-- 
2.17.0



[dpdk-dev] [PATCH] examples/ip_pipeline: fix uninitialized scalar variable

2018-04-16 Thread Fan Zhang
Coverity issue: 272575
Fixes: 133c2c6565d6 ("examples/ip_pipeline: add link object")

Signed-off-by: Fan Zhang 
---
 examples/ip_pipeline/cli.c  | 2 +-
 examples/ip_pipeline/link.c | 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/examples/ip_pipeline/cli.c b/examples/ip_pipeline/cli.c
index 199a31ff8..ec43a2308 100644
--- a/examples/ip_pipeline/cli.c
+++ b/examples/ip_pipeline/cli.c
@@ -133,7 +133,7 @@ cmd_link(char **tokens,
char *out,
size_t out_size)
 {
-   struct link_params p;
+   struct link_params p = {0};
struct link_params_rss rss;
struct link *link;
char *name;
diff --git a/examples/ip_pipeline/link.c b/examples/ip_pipeline/link.c
index 26ff41ba9..25717808c 100644
--- a/examples/ip_pipeline/link.c
+++ b/examples/ip_pipeline/link.c
@@ -121,17 +121,19 @@ link_create(const char *name, struct link_params *params)
(params->tx.queue_size == 0))
return NULL;
 
-   port_id = params->port_id;
if (params->dev_name) {
status = rte_eth_dev_get_port_by_name(params->dev_name,
&port_id);
 
if (status)
return NULL;
-   } else
-   if (!rte_eth_dev_is_valid_port(port_id))
+   } else {
+   if (!rte_eth_dev_is_valid_port(params->port_id))
return NULL;
 
+   port_id = params->port_id;
+   }
+
rte_eth_dev_info_get(port_id, &port_info);
 
mempool = mempool_find(params->rx.mempool_name);
-- 
2.13.6



[dpdk-dev] [PATCH] eal/vfio: export all VFIO functions when not compiling VFIO

2018-04-16 Thread Anatoly Burakov
Previously, VFIO functions were not compiled in and exported if
VFIO compilation was disabled. Fix this by actually compiling
all of the functions unconditionally, , and provide missing
prototypes on Linux.

Fixes: 279b581c897d ("vfio: expose functions")
Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Fixes: 964b2f3bfb07 ("vfio: export some internal functions")
Cc: hemant.agra...@nxp.com
Cc: gaetan.ri...@6wind.com
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal.c  | 15 +---
 lib/librte_eal/common/include/rte_vfio.h | 17 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 60 
 3 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index bfbec0d..d996190 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -756,20 +757,6 @@ rte_eal_vfio_intr_mode(void)
return RTE_INTR_MODE_NONE;
 }
 
-/* dummy forward declaration. */
-struct vfio_device_info;
-
-/* dummy prototypes. */
-int rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
-   int *vfio_dev_fd, struct vfio_device_info *device_info);
-int rte_vfio_release_device(const char *sysfs_base, const char *dev_addr, int 
fd);
-int rte_vfio_enable(const char *modname);
-int rte_vfio_is_enabled(const char *modname);
-int rte_vfio_noiommu_is_enabled(void);
-int rte_vfio_clear_group(int vfio_group_fd);
-int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
-int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
-
 int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
  __rte_unused const char *dev_addr,
  __rte_unused int *vfio_dev_fd,
diff --git a/lib/librte_eal/common/include/rte_vfio.h 
b/lib/librte_eal/common/include/rte_vfio.h
index c4a2e60..899baef 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -33,10 +33,6 @@
 #define VFIO_NOIOMMU_MODE  \
"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* NOIOMMU is defined from kernel version 4.5 onwards */
 #ifdef VFIO_NOIOMMU_IOMMU
 #define RTE_VFIO_NOIOMMU VFIO_NOIOMMU_IOMMU
@@ -44,6 +40,17 @@ extern "C" {
 #define RTE_VFIO_NOIOMMU 8
 #endif
 
+#else /* not VFIO_PRESENT */
+
+/* we don't need an actual definition, only pointer is used */
+struct vfio_device_info;
+
+#endif /* VFIO_PRESENT */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Setup vfio_cfg for the device identified by its address.
  * It discovers the configured I/O MMU groups or sets a new one for the device.
@@ -249,6 +256,4 @@ rte_vfio_get_group_fd(int iommu_group_num);
 }
 #endif
 
-#endif /* VFIO_PRESENT */
-
 #endif /* _RTE_VFIO_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 16ee730..19b3841 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1547,4 +1547,64 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t 
__rte_unused iova,
return -1;
 }
 
+int
+rte_vfio_setup_device(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr,
+   __rte_unused int *vfio_dev_fd,
+   __rte_unused struct vfio_device_info *device_info)
+{
+   return -1;
+}
+
+int
+rte_vfio_release_device(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr, __rte_unused int fd)
+{
+   return -1;
+}
+
+int
+rte_vfio_enable(__rte_unused const char *modname)
+{
+   return -1;
+}
+
+int
+rte_vfio_is_enabled(__rte_unused const char *modname)
+{
+   return -1;
+}
+
+int
+rte_vfio_noiommu_is_enabled(void)
+{
+   return -1;
+}
+
+int
+rte_vfio_clear_group(__rte_unused int vfio_group_fd)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_group_num(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr,
+   __rte_unused int *iommu_group_num)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_container_fd(void)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_group_fd(__rte_unused int iommu_group_num)
+{
+   return -1;
+}
+
 #endif
-- 
2.7.4


[dpdk-dev] [PATCH] examples/ip_pipeline: fix logically dead node

2018-04-16 Thread Fan Zhang
Coverity issue: 272567
Fixes: d75c371e9b46 ("examples/ip_pipeline: add pipeline object")

Signed-off-by: Fan Zhang 
---
 examples/ip_pipeline/cli.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/examples/ip_pipeline/cli.c b/examples/ip_pipeline/cli.c
index 199a31ff8..6c07ca223 100644
--- a/examples/ip_pipeline/cli.c
+++ b/examples/ip_pipeline/cli.c
@@ -1890,12 +1890,6 @@ cmd_pipeline_table(char **tokens,
 
t0 += 6;
} else if (strcmp(tokens[t0], "stub") == 0) {
-   if (n_tokens < t0 + 1) {
-   snprintf(out, out_size, MSG_ARG_MISMATCH,
-   "pipeline table stub");
-   return;
-   }
-
p.match_type = TABLE_STUB;
 
t0 += 1;
-- 
2.13.6



[dpdk-dev] [PATCH] examples/ip_pipeline: fix buffer not null terminated

2018-04-16 Thread Fan Zhang
Coverity issue: 272572
Fixes: 719374345cee ("examples/ip_pipeline: add action profile objects")

Signed-off-by: Fan Zhang 
---
 examples/ip_pipeline/action.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ip_pipeline/action.c b/examples/ip_pipeline/action.c
index 77a04fe19..91011ebe8 100644
--- a/examples/ip_pipeline/action.c
+++ b/examples/ip_pipeline/action.c
@@ -133,7 +133,7 @@ port_in_action_profile_create(const char *name,
}
 
/* Node fill in */
-   strncpy(profile->name, name, sizeof(profile->name));
+   strncpy(profile->name, name, sizeof(profile->name) - 1);
memcpy(&profile->params, params, sizeof(*params));
profile->ap = ap;
 
-- 
2.13.6



Re: [dpdk-dev] [PATCH v3 00/13] eal: replace calls to rte_panic and refrain from new instances

2018-04-16 Thread Burakov, Anatoly

On 13-Apr-18 7:30 PM, Arnon Warshavsky wrote:

The purpose of this patch series is to cleanup the library code
from paths that end up aborting the process,
and move to checking error values, in order to allow the running process
perform an orderly teardown or other mitigation of the event.

This patch modifies the majority of rte_panic calls
under lib and drivers, and replaces them with a log message
and an error return code according to context,
that can be propagated up the call stack.

- Focus was given to the dpdk initialization path
- Some of the panic calls within drivers were left in place where
   the call is from within an interrupt or calls that are
   on the data path,where there is no simple applicative
   route to propagate the error to temination.
   These should be handled by the driver maintainers.
- In order to avoid breaking ABI where panic was called from public
   void functions, a panic state variable was introduced so that
   it can be queried after calling these void functions.
   This tool place for a single function call.
- local void functions with no api were changed to retrun a value
   where needed
- No change took place in example and test files
- No change took place for debug assertions calling panic
- A new function was added to devtools/checkpatches.sh
   in order to prevent new additions of calls to rte_panic
   under lib and drivers.

Keep calm and don't panic

---

v2:
- reformat error messages so that literal string are in the same line
- fix typo in commit message
- add new return code to doxigen of rte_memzone_free()

v3:
- submit  all 13 patches changed and unchanged in the same patchset



This patchset needs to be rebased. There were a few changes that make 
some of the patches unnecessary.


Changes in patch 7 and 9 were addressed in earlier memory hotplug 
patchset, and are no longer applicable. Some things may have changed for 
patch 12 as well.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] compat: add virtio crypto header file

2018-04-16 Thread Zhang, Roy Fan


> -Original Message-
> From: Jay Zhou [mailto:jianjay.z...@huawei.com]
> Sent: Saturday, April 14, 2018 10:27 AM
> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; De Lara Guarch, Pablo
> ; Yigit, Ferruh ;
> Tan, Jianfeng ; Zhang, Roy Fan
> ; arei.gong...@huawei.com;
> weidong.hu...@huawei.com; wangxinxin.w...@huawei.com;
> jianjay.z...@huawei.com
> Subject: [PATCH] compat: add virtio crypto header file
> 
> Moving the virtio crypto header file from vhost lib to compat lib, then this
> header file can be shared between vhost crypto backend and virtio crypto
> PMD.
> 
> Signed-off-by: Jay Zhou 
> ---

Acked-by: Fan Zhang 


Re: [dpdk-dev] [PATCH v3 08/13] eal: replace rte_panic instances in hugepage_info

2018-04-16 Thread Burakov, Anatoly

On 13-Apr-18 7:30 PM, Arnon Warshavsky wrote:

replace panic calls with log and retrun value.

Signed-off-by: Arnon Warshavsky 
---
  lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 21 +++--
  1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 8bbf771..43af5b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -80,8 +80,11 @@
unsigned long long size = 0;
  
  	FILE *fd = fopen(proc_meminfo, "r");

-   if (fd == NULL)
-   rte_panic("Cannot open %s\n", proc_meminfo);
+   if (fd == NULL) {
+   RTE_LOG(CRIT, EAL, "%s(): Cannot open %s\n",
+   __func__, proc_meminfo);
+   return 0;
+   }
while(fgets(buffer, sizeof(buffer), fd)){
if (strncmp(buffer, str_hugepagesz, hugepagesz_len) == 0){
size = rte_str_to_size(&buffer[hugepagesz_len]);
@@ -89,8 +92,11 @@
}
}
fclose(fd);
-   if (size == 0)
-   rte_panic("Cannot get default hugepage size from %s\n", 
proc_meminfo);
+   if (size == 0) {
+   RTE_LOG(CRIT, EAL, "%s(): Cannot get default hugepage size from 
%s\n",
+__func__, proc_meminfo);
+   return 0;
+   }
return size;


If returning default hugepage size of 0 is now a possibility, the 
calling code needs to be able to handle that. Perhaps rewrite it as 
returning int, and accepting pointer to pagesz? e.g.


static int get_default_hp_size(uint64_t *page_sz)

and fix the code below to handle error in reading default page size?


  }
  
@@ -116,8 +122,11 @@

char *retval = NULL;
  
  	FILE *fd = fopen(proc_mounts, "r");

-   if (fd == NULL)
-   rte_panic("Cannot open %s\n", proc_mounts);
+   if (fd == NULL) {
+   RTE_LOG(CRIT, EAL, "%s(): Cannot open %s\n",
+   __func__, proc_mounts);
+   return NULL;
+   }
  
  	if (default_size == 0)

default_size = get_default_hp_size();



--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 1/8] bus/fslmc: support MC DPDMAI object

2018-04-16 Thread Shreyansh Jain

On Saturday 07 April 2018 08:46 PM, Nipun Gupta wrote:

This patch adds the DPDMAI (Data Path DMA Interface)
object support in MC driver.

Signed-off-by: Cristian Sovaiala 
Signed-off-by: Nipun Gupta 
---
  drivers/bus/fslmc/Makefile  |   3 +-
  drivers/bus/fslmc/mc/dpdmai.c   | 429 
  drivers/bus/fslmc/mc/fsl_dpdmai.h   | 189 
  drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h   | 107 +++
  drivers/bus/fslmc/rte_bus_fslmc_version.map |   9 +
  5 files changed, 736 insertions(+), 1 deletion(-)
  create mode 100644 drivers/bus/fslmc/mc/dpdmai.c
  create mode 100644 drivers/bus/fslmc/mc/fsl_dpdmai.h
  create mode 100644 drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h



Once rebased on master:

Acked-by: Shreyansh Jain 


Re: [dpdk-dev] [PATCH v5 02/21] eal: list acceptable init priorities

2018-04-16 Thread Neil Horman
On Sun, Apr 15, 2018 at 05:13:13PM +0200, Gaëtan Rivet wrote:
> Hello Neil,
> 
> On Sat, Apr 14, 2018 at 02:45:45PM -0400, Neil Horman wrote:
> > On Fri, Apr 13, 2018 at 02:55:11PM +0200, Gaëtan Rivet wrote:
> > > Hi Shreyansh,
> > > 
> > > On Fri, Apr 13, 2018 at 06:22:43PM +0530, Shreyansh Jain wrote:
> > > > On Friday 13 April 2018 05:12 PM, Neil Horman wrote:
> > > > > On Thu, Apr 12, 2018 at 11:57:47PM +0200, Gaëtan Rivet wrote:
> > > > > > Hello Neil,
> > > > > > 
> > > > > > On Thu, Apr 12, 2018 at 07:28:26AM -0400, Neil Horman wrote:
> > > > > > > On Wed, Apr 11, 2018 at 02:04:03AM +0200, Gaetan Rivet wrote:
> > > > > > > > Build a central list to quickly see each used priorities for
> > > > > > > > constructors, allowing to verify that they are both above 100 
> > > > > > > > and in the
> > > > > > > > proper order.
> > > > > > > > 
> > > > > > > > Signed-off-by: Gaetan Rivet 
> > > > > > > > Acked-by: Neil Horman 
> > > > > > > > Acked-by: Shreyansh Jain 
> > > > > > > > ---
> > > > > > > >   lib/librte_eal/common/eal_common_log.c | 2 +-
> > > > > > > >   lib/librte_eal/common/include/rte_bus.h| 2 +-
> > > > > > > >   lib/librte_eal/common/include/rte_common.h | 8 +++-
> > > > > > > >   3 files changed, 9 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/lib/librte_eal/common/eal_common_log.c 
> > > > > > > > b/lib/librte_eal/common/eal_common_log.c
> > > > > > > > index a27192620..36b9d6e08 100644
> > > > > > > > --- a/lib/librte_eal/common/eal_common_log.c
> > > > > > > > +++ b/lib/librte_eal/common/eal_common_log.c
> > > > > > > > @@ -260,7 +260,7 @@ static const struct logtype 
> > > > > > > > logtype_strings[] = {
> > > > > > > >   };
> > > > > > > >   /* Logging should be first initializer (before drivers and 
> > > > > > > > bus) */
> > > > > > > > -RTE_INIT_PRIO(rte_log_init, 101);
> > > > > > > > +RTE_INIT_PRIO(rte_log_init, LOG);
> > > > > > > >   static void
> > > > > > > >   rte_log_init(void)
> > > > > > > >   {
> > > > > > > > diff --git a/lib/librte_eal/common/include/rte_bus.h 
> > > > > > > > b/lib/librte_eal/common/include/rte_bus.h
> > > > > > > > index 6fb08341a..eb9eded4e 100644
> > > > > > > > --- a/lib/librte_eal/common/include/rte_bus.h
> > > > > > > > +++ b/lib/librte_eal/common/include/rte_bus.h
> > > > > > > > @@ -325,7 +325,7 @@ enum rte_iova_mode 
> > > > > > > > rte_bus_get_iommu_class(void);
> > > > > > > >* The constructor has higher priority than PMD constructors.
> > > > > > > >*/
> > > > > > > >   #define RTE_REGISTER_BUS(nm, bus) \
> > > > > > > > -RTE_INIT_PRIO(businitfn_ ##nm, 110); \
> > > > > > > > +RTE_INIT_PRIO(businitfn_ ##nm, BUS); \
> > > > > > > >   static void businitfn_ ##nm(void) \
> > > > > > > >   {\
> > > > > > > > (bus).name = RTE_STR(nm);\
> > > > > > > > diff --git a/lib/librte_eal/common/include/rte_common.h 
> > > > > > > > b/lib/librte_eal/common/include/rte_common.h
> > > > > > > > index 6c5bc5a76..8f04518f7 100644
> > > > > > > > --- a/lib/librte_eal/common/include/rte_common.h
> > > > > > > > +++ b/lib/librte_eal/common/include/rte_common.h
> > > > > > > > @@ -81,6 +81,12 @@ typedef uint16_t unaligned_uint16_t;
> > > > > > > >*/
> > > > > > > >   #define RTE_SET_USED(x) (void)(x)
> > > > > > > > +#define RTE_PRIORITY_LOG 101
> > > > > > > > +#define RTE_PRIORITY_BUS 110
> > > > > > > > +
> > > > > > > > +#define RTE_PRIO(prio) \
> > > > > > > > +   RTE_PRIORITY_ ## prio
> > > > > > > > +
> > > > > > > >   /**
> > > > > > > >* Run function before main() with low priority.
> > > > > > > >*
> > > > > > > > @@ -102,7 +108,7 @@ static void __attribute__((constructor, 
> > > > > > > > used)) func(void)
> > > > > > > >*   Lowest number is the first to run.
> > > > > > > >*/
> > > > > > > >   #define RTE_INIT_PRIO(func, prio) \
> > > > > > > > -static void __attribute__((constructor(prio), used)) func(void)
> > > > > > > > +static void __attribute__((constructor(RTE_PRIO(prio)), used)) 
> > > > > > > > func(void)
> > > > > > > It just occured to me, that perhaps you should add a 
> > > > > > > RTE_PRORITY_LAST priority,
> > > > > > > and redefine RTE_INIT to RTE_INIT_PRIO(func, RTE_PRIORITY_LAST) 
> > > > > > > for clarity.  I
> > > > > > > presume that constructors with no explicit priority run last, but 
> > > > > > > the gcc
> > > > > > > manual doesn't explicitly say that.  It would be a heck of a bug 
> > > > > > > to track down
> > > > > > > if somehow unprioritized constructors ran early.
> > > > > > > 
> > > > > > > Neil
> > > > > > > 
> > > > > > 
> > > > > > While certainly poorly documented, the behavior is well-defined. I 
> > > > > > don't see
> > > > > > a situation where the bug you describe could arise.
> > > > > > 
> > > > > > Adding RTE_PRIORITY_LAST is pretty harmless, but I'm not sure it's
> > > > > > justified to add it. If you still think it is useful, I will do it.
> > > > > > 
> > > > > It was more just a way to unify the macros is all, probabl

Re: [dpdk-dev] [PATCH 0/4] support for write combining

2018-04-16 Thread Rafał Kozik
Hello Bruce,

thank you for your advices.

> 1. Why not always have igb_uio support write-combining since it can be
> controlled thereafter via userspace mapping one file or another?

I added parameter to the igb_uio because currently it perform ioremap
and fails if it return NULL.
But performing ioremap makes it impossible to use WC, so I remove it.
ENA driver work well after this change, but I cannot test it on all
drivers and all platforms.
It seems to me that making it configurable prevents form spoiling
other drivers that could use internal_addr returned by ioremap.

> 2. Why not always map both resource and resource_wc files at the PCI level,
> and make them available via different pointers to the driver? Then the
> driver can choose at the per-access level which it wants to use. For
> example, for init of a device, a driver may do all register access via
> uncachable memory, and only use the write-combined support for
> performance-critical parts. [I have a draft patch lying around here
> somewhere that does something similar to that.]

I tried to implement this idea but without good results. I get mapping
with or without WC depending on mapping order.
As I was trying to find solution I come across with this paper:
https://www.kernel.org/doc/ols/2008/ols2008v2-pages-135-144.pdf
In section 5.3 and 5.4 it is discussing access to PCI resources.
According to it:

A request to uncached access can fail if there is already
an existing write-combine mapping for that region. A
request for write-combine access can succeed with un-
cached mapping instead, in the case of already existing
uncached mapping for this region.

We cannot use WC all the time, because it not guaranteed writing order.
On this basis I suppose that better option is to map each resource
only once depending on parameter provided by PMD.

> One last question - if using vfio-pci kernel module, do the resource_wc
> files present the bars as write-combined memory type, or are they
> uncachable?

I tried to use VFIO to map WC memory, but without success.

Best regards,
Rafal Kozik

2018-04-11 16:42 GMT+02:00 Bruce Richardson :
> On Wed, Apr 11, 2018 at 04:07:13PM +0200, Rafal Kozik wrote:
>> Support for write combining.
>>
>> Rafal Kozik (4):
>>   igb_uio: add wc option
>>   bus/pci: reference driver structure
>>   eal: enable WC during resources mapping
>>   net/ena: enable WC
>>
>>  drivers/bus/pci/linux/pci_uio.c | 39 ---
>>  drivers/bus/pci/pci_common.c| 13 -
>>  drivers/bus/pci/rte_bus_pci.h   |  2 ++
>>  drivers/net/ena/ena_ethdev.c|  3 ++-
>>  kernel/linux/igb_uio/igb_uio.c  | 17 ++---
>>  5 files changed, 54 insertions(+), 20 deletions(-)
>>
> Couple of thoughts on this set.
>
> You add an option to the kernel module to allow wc to be supported on a
> device, but when we go to do PCI mapping, we either map the regular
> resource file or the _wc one. Therefore:
>
> 1. Why not always have igb_uio support write-combining since it can be
> controlled thereafter via userspace mapping one file or another?
>
> 2. Why not always map both resource and resource_wc files at the PCI level,
> and make them available via different pointers to the driver? Then the
> driver can choose at the per-access level which it wants to use. For
> example, for init of a device, a driver may do all register access via
> uncachable memory, and only use the write-combined support for
> performance-critical parts. [I have a draft patch lying around here
> somewhere that does something similar to that.]
>
> One last question - if using vfio-pci kernel module, do the resource_wc
> files present the bars as write-combined memory type, or are they
> uncachable?
>
> Regards,
> /Bruce


Re: [dpdk-dev] [PATCH 2/8] bus/fslmc: support scanning and probing of QDMA devices

2018-04-16 Thread Shreyansh Jain

On Saturday 07 April 2018 08:46 PM, Nipun Gupta wrote:

Signed-off-by: Nipun Gupta 
---
  drivers/bus/fslmc/fslmc_bus.c  | 2 ++
  drivers/bus/fslmc/fslmc_vfio.c | 1 +
  drivers/bus/fslmc/rte_fslmc.h  | 2 ++
  3 files changed, 5 insertions(+)



[...]
Can you please explain, in the commit message, the relation between 
'DPDMAI' and 'QDMA' devices? Your patch has both these tokens.



diff --git a/drivers/bus/fslmc/rte_fslmc.h b/drivers/bus/fslmc/rte_fslmc.h
index 69d0fec..a454ef5 100644
--- a/drivers/bus/fslmc/rte_fslmc.h
+++ b/drivers/bus/fslmc/rte_fslmc.h
@@ -61,6 +61,7 @@ enum rte_dpaa2_dev_type {
DPAA2_IO,   /**< DPIO type device */
DPAA2_CI,   /**< DPCI type device */
DPAA2_MPORTAL,  /**< DPMCP type device */
+   DPAA2_QDMA, /**< DPDMAI type device */
/* Unknown device placeholder */
DPAA2_UNKNOWN,
DPAA2_DEVTYPE_MAX,
@@ -91,6 +92,7 @@ struct rte_dpaa2_device {
union {
struct rte_eth_dev *eth_dev;/**< ethernet device */
struct rte_cryptodev *cryptodev;/**< Crypto Device */
+   struct rte_rawdev *rawdev;  /**< DPAA2 raw Device */

/'''^^
just a 'Raw device' please./


};
enum rte_dpaa2_dev_type dev_type;   /**< Device Type */
uint16_t object_id; /**< DPAA2 Object ID */



Just the trivial issues noted above and after rebasing on master, please 
use:


Acked-by: Shreyansh Jain 


Re: [dpdk-dev] [PATCH v3 01/13] crypto: replace rte_panic instances in crypto driver

2018-04-16 Thread Neil Horman
On Fri, Apr 13, 2018 at 09:30:32PM +0300, Arnon Warshavsky wrote:
> replace panic calls with log and return value.
> 
> --
> v2:
> - reformat error message to include literal string in a single line
> 
> Signed-off-by: Arnon Warshavsky 
> ---
>  drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 8 +---
>  drivers/crypto/dpaa_sec/dpaa_sec.c  | 8 +---
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c 
> b/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
> index 784b96d..9e0ca7f 100644
> --- a/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
> +++ b/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
> @@ -2861,9 +2861,11 @@ struct rte_security_ops dpaa2_sec_security_ops = {
>   RTE_CACHE_LINE_SIZE,
>   rte_socket_id());
>  
> - if (cryptodev->data->dev_private == NULL)
> - rte_panic("Cannot allocate memzone for private "
> -   "device data");
> + if (cryptodev->data->dev_private == NULL) {
> + RTE_LOG(ERR, PMD, "%s() Cannot allocate memzone for 
> private device data",
> + __func__);
> + return -1;
> + }
>   }
>  
>   dpaa2_dev->cryptodev = cryptodev;
> diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c 
> b/drivers/crypto/dpaa_sec/dpaa_sec.c
> index c5191ce..793891a 100644
> --- a/drivers/crypto/dpaa_sec/dpaa_sec.c
> +++ b/drivers/crypto/dpaa_sec/dpaa_sec.c
> @@ -2374,9 +2374,11 @@ struct rte_security_ops dpaa_sec_security_ops = {
>   RTE_CACHE_LINE_SIZE,
>   rte_socket_id());
>  
> - if (cryptodev->data->dev_private == NULL)
> - rte_panic("Cannot allocate memzone for private "
> - "device data");
> + if (cryptodev->data->dev_private == NULL) {
> + RTE_LOG(ERR, PMD, "%s() Cannot allocate memzone for 
> private device data",
> + __func__);
> + return -1;
> + }
>   }
>  
This function is only called from locations that return a -errno code, not just
-1.

Neil

>   dpaa_dev->crypto_dev = cryptodev;
> -- 
> 1.8.3.1
> 
> 


Re: [dpdk-dev] [PATCH] eal/vfio: export all VFIO functions when not compiling VFIO

2018-04-16 Thread Thomas Monjalon
16/04/2018 12:59, Anatoly Burakov:
> --- a/lib/librte_eal/common/include/rte_vfio.h
> +++ b/lib/librte_eal/common/include/rte_vfio.h
> @@ -33,10 +33,6 @@
>  #define VFIO_NOIOMMU_MODE  \
>   "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>  
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  /* NOIOMMU is defined from kernel version 4.5 onwards */
>  #ifdef VFIO_NOIOMMU_IOMMU
>  #define RTE_VFIO_NOIOMMU VFIO_NOIOMMU_IOMMU
> @@ -44,6 +40,17 @@ extern "C" {
>  #define RTE_VFIO_NOIOMMU 8
>  #endif
>  
> +#else /* not VFIO_PRESENT */
> +
> +/* we don't need an actual definition, only pointer is used */
> +struct vfio_device_info;
> +
> +#endif /* VFIO_PRESENT */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

Why moving this extern "C"?
Could it be at the top of the file?

[...]
> +int __rte_experimental
> +rte_vfio_get_group_fd(__rte_unused int iommu_group_num)
> +{
> + return -1;
> +}
> +
>  #endif

This #endif needs a comment.




Re: [dpdk-dev] [PATCH] eal/vfio: export all VFIO functions when not compiling VFIO

2018-04-16 Thread Burakov, Anatoly

On 16-Apr-18 12:55 PM, Thomas Monjalon wrote:

16/04/2018 12:59, Anatoly Burakov:

--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -33,10 +33,6 @@
  #define VFIO_NOIOMMU_MODE  \
"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
  
-#ifdef __cplusplus

-extern "C" {
-#endif
-
  /* NOIOMMU is defined from kernel version 4.5 onwards */
  #ifdef VFIO_NOIOMMU_IOMMU
  #define RTE_VFIO_NOIOMMU VFIO_NOIOMMU_IOMMU
@@ -44,6 +40,17 @@ extern "C" {
  #define RTE_VFIO_NOIOMMU 8
  #endif
  
+#else /* not VFIO_PRESENT */

+
+/* we don't need an actual definition, only pointer is used */
+struct vfio_device_info;
+
+#endif /* VFIO_PRESENT */
+
+#ifdef __cplusplus
+extern "C" {
+#endif


Why moving this extern "C"?
Could it be at the top of the file?


As it was, it was inside #ifdef VFIO_PRESENT. It can be at the top, or 
it can be where it is in this patch, not much difference.




[...]

+int __rte_experimental
+rte_vfio_get_group_fd(__rte_unused int iommu_group_num)
+{
+   return -1;
+}
+
  #endif


This #endif needs a comment.


Will do.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 3/8] bus/fslmc: add macros required by QDMA for FLE and FD

2018-04-16 Thread Shreyansh Jain

On Saturday 07 April 2018 08:46 PM, Nipun Gupta wrote:

Signed-off-by: Nipun Gupta 
---
  drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h 
b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
index 1ef9502..b7b98d1 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
@@ -212,10 +212,12 @@ enum qbman_fd_format {


Acked-by: Shreyansh Jain 


[dpdk-dev] [PATCH v2] eal/vfio: export all VFIO functions when not compiling VFIO

2018-04-16 Thread Anatoly Burakov
Previously, VFIO functions were not compiled in and exported if
VFIO compilation was disabled. Fix this by actually compiling
all of the functions unconditionally, , and provide missing
prototypes on Linux.

Fixes: 279b581c897d ("vfio: expose functions")
Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Fixes: 964b2f3bfb07 ("vfio: export some internal functions")
Cc: hemant.agra...@nxp.com
Cc: gaetan.ri...@6wind.com
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
---

Notes:
v2:
- Move "extern C" declaration to top of the file
- Add comment for closing #endif in .c file

 lib/librte_eal/bsdapp/eal/eal.c  | 15 +---
 lib/librte_eal/common/include/rte_vfio.h | 17 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 62 +++-
 3 files changed, 73 insertions(+), 21 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index bfbec0d..d996190 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -756,20 +757,6 @@ rte_eal_vfio_intr_mode(void)
return RTE_INTR_MODE_NONE;
 }
 
-/* dummy forward declaration. */
-struct vfio_device_info;
-
-/* dummy prototypes. */
-int rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
-   int *vfio_dev_fd, struct vfio_device_info *device_info);
-int rte_vfio_release_device(const char *sysfs_base, const char *dev_addr, int 
fd);
-int rte_vfio_enable(const char *modname);
-int rte_vfio_is_enabled(const char *modname);
-int rte_vfio_noiommu_is_enabled(void);
-int rte_vfio_clear_group(int vfio_group_fd);
-int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
-int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
-
 int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
  __rte_unused const char *dev_addr,
  __rte_unused int *vfio_dev_fd,
diff --git a/lib/librte_eal/common/include/rte_vfio.h 
b/lib/librte_eal/common/include/rte_vfio.h
index c4a2e60..8900064 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -10,6 +10,10 @@
  * RTE VFIO. This library provides various VFIO related utility functions.
  */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * determine if VFIO is present on the system
  */
@@ -33,10 +37,6 @@
 #define VFIO_NOIOMMU_MODE  \
"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* NOIOMMU is defined from kernel version 4.5 onwards */
 #ifdef VFIO_NOIOMMU_IOMMU
 #define RTE_VFIO_NOIOMMU VFIO_NOIOMMU_IOMMU
@@ -44,6 +44,13 @@ extern "C" {
 #define RTE_VFIO_NOIOMMU 8
 #endif
 
+#else /* not VFIO_PRESENT */
+
+/* we don't need an actual definition, only pointer is used */
+struct vfio_device_info;
+
+#endif /* VFIO_PRESENT */
+
 /**
  * Setup vfio_cfg for the device identified by its address.
  * It discovers the configured I/O MMU groups or sets a new one for the device.
@@ -249,6 +256,4 @@ rte_vfio_get_group_fd(int iommu_group_num);
 }
 #endif
 
-#endif /* VFIO_PRESENT */
-
 #endif /* _RTE_VFIO_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 16ee730..def71a6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1547,4 +1547,64 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t 
__rte_unused iova,
return -1;
 }
 
-#endif
+int
+rte_vfio_setup_device(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr,
+   __rte_unused int *vfio_dev_fd,
+   __rte_unused struct vfio_device_info *device_info)
+{
+   return -1;
+}
+
+int
+rte_vfio_release_device(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr, __rte_unused int fd)
+{
+   return -1;
+}
+
+int
+rte_vfio_enable(__rte_unused const char *modname)
+{
+   return -1;
+}
+
+int
+rte_vfio_is_enabled(__rte_unused const char *modname)
+{
+   return -1;
+}
+
+int
+rte_vfio_noiommu_is_enabled(void)
+{
+   return -1;
+}
+
+int
+rte_vfio_clear_group(__rte_unused int vfio_group_fd)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_group_num(__rte_unused const char *sysfs_base,
+   __rte_unused const char *dev_addr,
+   __rte_unused int *iommu_group_num)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_container_fd(void)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_get_group_fd(__rte_unused int iommu_group_num)
+{
+   return -1;
+}
+
+#endif /* VFIO_PRESENT */
-- 
2.7.4


Re: [dpdk-dev] [PATCH v7 1/5] vfio: extend data structure for multi container

2018-04-16 Thread Wang, Xiao W
Hi Anatoly,

> -Original Message-
> From: Burakov, Anatoly
> Sent: Monday, April 16, 2018 6:03 PM
> To: Wang, Xiao W ; Yigit, Ferruh
> 
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; Wang, Zhihong
> ; Bie, Tiwei ; Tan, Jianfeng
> ; Liang, Cunming ; Daly,
> Dan ; tho...@monjalon.net; Chen, Junjie J
> 
> Subject: Re: [PATCH v7 1/5] vfio: extend data structure for multi container
> 
> On 15-Apr-18 4:33 PM, Xiao Wang wrote:
> > Currently eal vfio framework binds vfio group fd to the default
> > container fd during rte_vfio_setup_device, while in some cases,
> > e.g. vDPA (vhost data path acceleration), we want to put vfio group
> > to a separate container and program IOMMU via this container.
> >
> > This patch extends the vfio_config structure to contain per-container
> > user_mem_maps and defines an array of vfio_config. The next patch will
> > base on this to add container API.
> >
> > Signed-off-by: Junjie Chen 
> > Signed-off-by: Xiao Wang 
> > Reviewed-by: Maxime Coquelin 
> > Reviewed-by: Ferruh Yigit 
> > ---
> >   config/common_base |   1 +
> >   lib/librte_eal/linuxapp/eal/eal_vfio.c | 407 ++---
> 
> >   lib/librte_eal/linuxapp/eal/eal_vfio.h |  19 +-
> >   3 files changed, 275 insertions(+), 152 deletions(-)
> >
> > diff --git a/config/common_base b/config/common_base
> > index c4236fd1f..4a76d2f14 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -87,6 +87,7 @@ CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
> >   CONFIG_RTE_EAL_IGB_UIO=n
> >   CONFIG_RTE_EAL_VFIO=n
> >   CONFIG_RTE_MAX_VFIO_GROUPS=64
> > +CONFIG_RTE_MAX_VFIO_CONTAINERS=64
> >   CONFIG_RTE_MALLOC_DEBUG=n
> >   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
> >   CONFIG_RTE_USE_LIBBSD=n
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 589d7d478..46fba2d8d 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -22,8 +22,46 @@
> >
> >   #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
> >
> > +/*
> > + * we don't need to store device fd's anywhere since they can be obtained
> from
> > + * the group fd via an ioctl() call.
> > + */
> > +struct vfio_group {
> > +   int group_no;
> > +   int fd;
> > +   int devices;
> > +};
> 
> What is the purpose of moving this into .c file? Seems like an
> unnecessary change.

Yes, we can let vfio_group stay at .h, and move vfio_config into .c

> 
> > +
> > +/* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped.
> we can
> > + * recreate the mappings for DPDK segments, but we cannot do so for
> memory that
> > + * was registered by the user themselves, so we need to store the user
> mappings
> > + * somewhere, to recreate them later.
> > + */
> > +#define VFIO_MAX_USER_MEM_MAPS 256
> > +struct user_mem_map {
> > +   uint64_t addr;
> > +   uint64_t iova;
> > +   uint64_t len;
> > +};
> > +
> 
> <...>
> 
> > +static struct vfio_config *
> > +get_vfio_cfg_by_group_no(int iommu_group_no)
> > +{
> > +   struct vfio_config *vfio_cfg;
> > +   int i, j;
> > +
> > +   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> > +   vfio_cfg = &vfio_cfgs[i];
> > +   for (j = 0; j < VFIO_MAX_GROUPS; j++) {
> > +   if (vfio_cfg->vfio_groups[j].group_no ==
> > +   iommu_group_no)
> > +   return vfio_cfg;
> > +   }
> > +   }
> > +
> > +   return default_vfio_cfg;
> 
> Here and in other places: i'm not sure returning default vfio config if
> group not found is such a good idea. It would be better if calling code
> explicitly handled case of group not existing yet.

Agree. It would be explicit.

> 
> > +}
> > +
> > +static struct vfio_config *
> > +get_vfio_cfg_by_group_fd(int vfio_group_fd)
> > +{
> > +   struct vfio_config *vfio_cfg;
> > +   int i, j;
> > +
> > +   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> > +   vfio_cfg = &vfio_cfgs[i];
> > +   for (j = 0; j < VFIO_MAX_GROUPS; j++)
> > +   if (vfio_cfg->vfio_groups[j].fd == vfio_group_fd)
> > +   return vfio_cfg;
> > +   }
> >
> 
> <...>
> 
> > -   for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> > -   vfio_cfg.vfio_groups[i].fd = -1;
> > -   vfio_cfg.vfio_groups[i].group_no = -1;
> > -   vfio_cfg.vfio_groups[i].devices = 0;
> > +   rte_spinlock_recursive_t lock =
> RTE_SPINLOCK_RECURSIVE_INITIALIZER;
> > +
> > +   for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> > +   vfio_cfgs[i].vfio_container_fd = -1;
> > +   vfio_cfgs[i].vfio_active_groups = 0;
> > +   vfio_cfgs[i].vfio_iommu_type = NULL;
> > +   vfio_cfgs[i].mem_maps.lock = lock;
> 
> Nitpick - why copy, instead of straight up initializing with
> RTE_SPINLOCK_RECURSIVE_INITIALIZER?

I tried but compiler doesn't allow this assignment.
RTE_SPINLOCK_RECURSIVE_INITIALIZER could only be used for initialization.

Thanks for the comme

[dpdk-dev] [PATCH] examples/ipsec-secgw: fix usage print

2018-04-16 Thread Anoob Joseph
The usage print was not updated when jumbo frames & crypto_dev mask
support was added. Fixing that. Also, the optional arguments were not
properly highlighted in the usage header. This is also fixed.

General cleanup of the usage print was also done to make it look more
cleaner and similar to what is existing in other applications like
l3fwd.

Fixes: bbabfe6e4ee4 ("examples/ipsec_secgw: support jumbo frames")
Fixes: 2c68fe791538 ("examples/ipsec-secgw: add cryptodev mask option")
Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")

Signed-off-by: Anoob Joseph 
---
 examples/ipsec-secgw/ipsec-secgw.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index 18330fe..1494b02 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -945,20 +945,28 @@ init_lcore_rx_queues(void)
 static void
 print_usage(const char *prgname)
 {
-   printf("%s [EAL options] -- -p PORTMASK -P -u PORTMASK"
-   "  --"CMD_LINE_OPT_CONFIG" 
(port,queue,lcore)[,(port,queue,lcore]"
-   " --single-sa SAIDX -f CONFIG_FILE\n"
-   "  -p PORTMASK: hexadecimal bitmask of ports to configure\n"
-   "  -P : enable promiscuous mode\n"
-   "  -u PORTMASK: hexadecimal bitmask of unprotected ports\n"
-   "  -j FRAMESIZE: jumbo frame maximum size\n"
-   "  --"CMD_LINE_OPT_CONFIG": (port,queue,lcore): "
-   "rx queues configuration\n"
-   "  --single-sa SAIDX: use single SA index for outbound, "
-   "bypassing the SP\n"
-   "  --cryptodev_mask MASK: hexadecimal bitmask of the "
-   "crypto devices to configure\n"
-   "  -f CONFIG_FILE: Configuration file path\n",
+   fprintf(stderr, "%s [EAL options] --"
+   " -p PORTMASK"
+   " [-P]"
+   " [-u PORTMASK]"
+   " [-j FRAMESIZE]"
+   " -f CONFIG_FILE"
+   " --config (port,queue,lcore)[,(port,queue,lcore)]"
+   " [--single-sa SAIDX]"
+   " [--cryptodev_mask MASK]"
+   "\n\n"
+   "  -p PORTMASK: Hexadecimal bitmask of ports to configure\n"
+   "  -P : Enable promiscuous mode\n"
+   "  -u PORTMASK: Hexadecimal bitmask of unprotected ports\n"
+   "  -j FRAMESIZE: Enable jumbo frame with 'FRAMESIZE' as 
maximum\n"
+   "packet size\n"
+   "  -f CONFIG_FILE: Configuration file\n"
+   "  --config (port,queue,lcore): Rx queue configuration\n"
+   "  --single-sa SAIDX: Use single SA index for outbound 
traffic,\n"
+   " bypassing the SP\n"
+   "  --cryptodev_mask MASK: Hexadecimal bitmask of the crypto\n"
+   " devices to configure\n"
+   "\n",
prgname);
 }
 
-- 
2.7.4



[dpdk-dev] dpdk and dpdk-next-net build is broken on RHEL 7.4

2018-04-16 Thread Andrew Rybchenko

Hi,

dpdk and dpdk-next-net build is broken on RHEL 7.4.
It looks like after pull from next-eventdev.

== Build app/test-eventdev
  CC test_perf_common.o
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c: 
In function ‘perf_event_timer_produ

cer’:
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:99:3: 
error: missing initializer for
 field ‘priority’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .ev.sched_type = t->opt->sched_type_list[0],
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12: 
n

ote: ‘priority’ declared here
    uint8_t priority;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:100:3: 
error: missing initializer fo
r field ‘priority’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12: 
n

ote: ‘priority’ declared here
    uint8_t priority;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:101:3: 
error: missing initializer fo
r field ‘impl_opaque’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .ev.event_type =  RTE_EVENT_TYPE_TIMER,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12: 
n

ote: ‘impl_opaque’ declared here
    uint8_t impl_opaque;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:102:3: 
error: missing initializer fo
r field ‘impl_opaque’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .state = RTE_EVENT_TIMER_NOT_ARMED,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12: 
n

ote: ‘impl_opaque’ declared here
    uint8_t impl_opaque;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c: 
In function ‘perf_event_timer_produ

cer_burst’:
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:161:3: 
error: missing initializer fo
r field ‘priority’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .ev.sched_type = t->opt->sched_type_list[0],
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12: 
n

ote: ‘priority’ declared here
    uint8_t priority;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:162:3: 
error: missing initializer fo
r field ‘priority’ of ‘struct ’ 
[-Werror=missing-field-initializers]

   .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12: 
note: ‘priority’ declared here

    uint8_t priority;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:163:3: 
error: missing initializer for field ‘impl_opaque’ of ‘struct 
’ [-Werror=missing-field-initializers]

   .ev.event_type =  RTE_EVENT_TYPE_TIMER,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
/home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12: 
note: ‘impl_opaque’ declared here

    uint8_t impl_opaque;
    ^
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:164:3: 
error: missing initializer for field ‘impl_opaque’ of ‘struct 
’ [-Werror=missing-field-initializers]

   .state = RTE_EVENT_TIMER_NOT_ARMED,
   ^
In file included from 
/home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
 from 
/home/arybchik/src/dpdk-next-net/app/test

Re: [dpdk-dev] [PATCH v3 07/14] net/mlx5: support tunnel RSS level

2018-04-16 Thread Nélio Laranjeiro
On Sat, Apr 14, 2018 at 10:12:58AM +, Xueming(Steven) Li wrote:
> Hi Nelio,
>[...]
> > > + if (!found)
> > > + DRV_LOG(WARNING,
> > > + "port %u rss hash function doesn't match "
> > > + "pattern", dev->data->port_id);
> > 
> > The hash function is toeplitz, xor, it is not applied on the pattern but
> > used to compute an hash result using some information from the packet.
> > This comment is totally wrong.
> 
> Thanks, I'll replace "hash function" to "hash fields".
> 
> > 
> > Another point, such log will trigger on an application using MLX5 PMD but
> > not on MLX4 PMD and this specifically because on how the NIC using the
> > MLX5 PMD are made internally (MLX4 can use a single Hash RX queue whereas
> > MLX5 needs an Hash Rx queue per kind of protocol).
> > The fact being it will have the exact same behavior I'll *strongly*
> > suggest to remove such annoying warning.
> 
> After some test on mlx5 current code, the behavior in previous code doesn't
> seem to be consistent, not sure whether it same in mlx4 PMD:
> - Pattern: eth/ipv4/tcp, RSS: UDP, creation success.
> - Pattern: eth/ipv4,RSS: IPv6, creation failed.

Seems there is a bug.

> This patch support the 2nd case w/o hash, and warn upon the first case.
> Take example of first case, a packet that matches the pattern must be TCP,
> no reason to hash it as TCP, same to the 2nd case. They are totally
> wrong configuration, but to be robust, warning is used here, and users 
> have to learn that NO hash result because HF configuration mismatch through 
> this warning message.
> 
> Please note that below cases are valid and no warning:
> - Pattern: eth/ipv4, RSS: UDP
> - Pattern: eth/ipv4/udp, RSS: IPv4

This log will not raise for non IP protocols defined by the user, or it
will raise when the user already expects it to not make RSS.
It will more annoying than helping.

Example: 

 flow create 0 ingress eth ethertype is 0x0806 / end actions rss 

won't raise such log, whereas ARP is not an IP protocol and thus can be
RSS'ed.

 flow create 0 ingress eth / ipv4 / end actions rss type ipv6...

will raise the log, but it is obvious the user won't have RSS.

Regards,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v2 07/15] net/mlx5: support tunnel RSS level

2018-04-16 Thread Nélio Laranjeiro
On Mon, Apr 16, 2018 at 10:06:06AM +, Xueming(Steven) Li wrote:
> 
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Monday, April 16, 2018 4:09 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org
> > Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> > 
> > On Mon, Apr 16, 2018 at 07:46:08AM +, Xueming(Steven) Li wrote:
> > >[...]
> > > > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct
> > > > > > > rte_eth_dev
> > > > *dev)
> > > > > > >   *   Number of queues.
> > > > > > >   * @param tunnel
> > > > > > >   *   Tunnel type.
> > > > > > > + * @param rss_level
> > > > > > > + *   RSS hash on tunnel level.
> > > > > > >   *
> > > > > > >   * @return
> > > > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno
> > is
> > > > set.
> > > > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *
> > > > > > > mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > > > > > const uint8_t *rss_key, uint32_t rss_key_len,
> > > > > > > uint64_t hash_fields,
> > > > > > > -   const uint16_t *queues, uint32_t queues_n, uint32_t
> > > > tunnel)
> > > > > > > +   const uint16_t *queues, uint32_t queues_n,
> > > > > > > +   uint32_t tunnel, uint32_t rss_level)
> > > > > >
> > > > > > tunnel and rss_level seems to be redundant here.
> > > > > >
> > > > > > rss_level > 1 is equivalent to tunnel, there is no need to have
> > both.
> > > > >
> > > > > There is a case of tunnel and outer rss(1).
> > > >
> > > > Why cannot it be handled by a regular Hash Rx queue, i.e. what is
> > > > the benefit of creating a tunnel hash Rx queue to make the same job
> > > > as a legacy one?
> > >
> > > Tunnel checksum, ptype and rss offloading demand a QP to be created by
> > > DV api with tunnel offload flags.
> > 
> > I was expecting such answer, such information should be present in the
> > function documentation, can you add it?
> 
> You mean https://dpdk.org/doc/guides/nics/overview.html?
> "Inner L3 checksum" and "Inner L4 checksum" defined. 
> I added "Inner RSS" per your suggestion, The only thing missing is 
> "Innner packet type", make sense?

No I mean adding in this function doxygen documentation the fact than
tunnel is to have the checksum offload whereas the rss_level will be to
enable the RSS in the inner.

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v2 8/9] doc: add DPAA2 CMDIF rawdev guide

2018-04-16 Thread Hemant Agrawal



On 4/7/2018 8:04 PM, Nipun Gupta wrote:

Signed-off-by: Nipun Gupta 
---
  MAINTAINERS|   1 +
  doc/guides/rawdevs/dpaa2_cmdif.rst | 132 +


you also need to add entry in guides/index for these. Also create a 
rawdevs/index.rst file with dpaa2_cmdif.rst entry




Re: [dpdk-dev] [PATCH v7 2/5] vfio: add multi container support

2018-04-16 Thread Wang, Xiao W
Hi Anatoly,

> -Original Message-
> From: Burakov, Anatoly
> Sent: Monday, April 16, 2018 6:03 PM
> To: Wang, Xiao W ; Yigit, Ferruh
> 
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; Wang, Zhihong
> ; Bie, Tiwei ; Tan, Jianfeng
> ; Liang, Cunming ; Daly,
> Dan ; tho...@monjalon.net; Chen, Junjie J
> 
> Subject: Re: [PATCH v7 2/5] vfio: add multi container support
> 
> On 15-Apr-18 4:33 PM, Xiao Wang wrote:
> > This patch adds APIs to support container create/destroy and device
> > bind/unbind with a container. It also provides API for IOMMU programing
> > on a specified container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> 
> ^^ wrong API name in commit message :)

Thanks for the catch. Will fix it.

> 
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container. During rte_vfio_setup_device the
> > container bound with the device will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen 
> > Signed-off-by: Xiao Wang 
> > Reviewed-by: Maxime Coquelin 
> > Reviewed-by: Ferruh Yigit 
> > ---
> >   lib/librte_eal/bsdapp/eal/eal.c  |  52 +
> >   lib/librte_eal/common/include/rte_vfio.h | 119 
> >   lib/librte_eal/linuxapp/eal/eal_vfio.c   | 316
> +++
> >   lib/librte_eal/rte_eal_version.map   |   6 +
> >   4 files changed, 493 insertions(+)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal.c 
> > b/lib/librte_eal/bsdapp/eal/eal.c
> > index 727adc5d2..c5106d0d6 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal.c
> > @@ -769,6 +769,14 @@ int rte_vfio_noiommu_is_enabled(void);
> >   int rte_vfio_clear_group(int vfio_group_fd);
> >   int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
> >   int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> > +int rte_vfio_container_create(void);
> > +int rte_vfio_container_destroy(int container_fd);
> > +int rte_vfio_bind_group(int container_fd, int iommu_group_no);
> > +int rte_vfio_unbind_group(int container_fd, int iommu_group_no);
> 
> Maybe have these under "container" too? e.g.
> rte_vfio_container_group_bind/unbind? Seems like it would be more
> consistent that way - anything to do with custom containers would be
> under rte_vfio_container_* namespace.

Agree.

> 
> > +int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
> > +   uint64_t iova, uint64_t len);
> > +int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
> > +   uint64_t iova, uint64_t len);
> >
> >   int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
> >   __rte_unused const char *dev_addr,
> > @@ -818,3 +826,47 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr,
> uint64_t __rte_unused iova,
> >   {
> > return -1;
> >   }
> > +
> 
> <...>
> 
> > diff --git a/lib/librte_eal/common/include/rte_vfio.h
> b/lib/librte_eal/common/include/rte_vfio.h
> > index d26ab01cb..0c1509b29 100644
> > --- a/lib/librte_eal/common/include/rte_vfio.h
> > +++ b/lib/librte_eal/common/include/rte_vfio.h
> > @@ -168,6 +168,125 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova,
> uint64_t len);
> >   int __rte_experimental
> >   rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Create a new container for device binding.
> 
> I would add a note that any newly allocated DPDK memory will not be
> mapped into these containers by default.

Will add it.

> 
> > + *
> > + * @return
> > + *   the container fd if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_container_create(void);
> > +
> 
> <...>
> 
> > + *0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_unbind_group(int container_fd, int iommu_group_no);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> 
> Here and in other places: "dma" should be DMA, and typo: "conainer" :)
> 
> I think you should also add a note to the original API (not this one,
> but the old one) that DMA maps done via that API will only apply to
> default container and will not apply to any of the containers created
> via container_create(). IOW, documentation should make it clear that if
> you use this functionality, you're on your own and you have to manage
> your own DMA mappings for any containers you create.

OK, will add note to clearly describe it.

> 
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param vaddr
> > + *   Starting virtual address of memory to be mapped.
> > + *
> 
> <...>
> 
> > +
> > +int __rte_experimental
> > +rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t
> iova,
> > +   uin

Re: [dpdk-dev] dpdk and dpdk-next-net build is broken on RHEL 7.4

2018-04-16 Thread Pavan Nikhilesh
Hi Andrew,

Thanks for reporting the issue, will fix it and send out a patch soon.

Pavan

On Mon, Apr 16, 2018 at 03:24:21PM +0300, Andrew Rybchenko wrote:
> Hi,
>
> dpdk and dpdk-next-net build is broken on RHEL 7.4.
> It looks like after pull from next-eventdev.
>
> == Build app/test-eventdev
>   CC test_perf_common.o
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c: In
> function ‘perf_event_timer_produ
> cer’:
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:99:3:
> error: missing initializer for
>  field ‘priority’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.sched_type = t->opt->sched_type_list[0],
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12:
> n
> ote: ‘priority’ declared here
>     uint8_t priority;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:100:3:
> error: missing initializer fo
> r field ‘priority’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12:
> n
> ote: ‘priority’ declared here
>     uint8_t priority;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:101:3:
> error: missing initializer fo
> r field ‘impl_opaque’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.event_type =  RTE_EVENT_TYPE_TIMER,
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12:
> n
> ote: ‘impl_opaque’ declared here
>     uint8_t impl_opaque;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:102:3:
> error: missing initializer fo
> r field ‘impl_opaque’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .state = RTE_EVENT_TIMER_NOT_ARMED,
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12:
> n
> ote: ‘impl_opaque’ declared here
>     uint8_t impl_opaque;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c: In
> function ‘perf_event_timer_produ
> cer_burst’:
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:161:3:
> error: missing initializer fo
> r field ‘priority’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.sched_type = t->opt->sched_type_list[0],
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12:
> n
> ote: ‘priority’ declared here
>     uint8_t priority;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:162:3:
> error: missing initializer fo
> r field ‘priority’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1049:12:
> note: ‘priority’ declared here
>     uint8_t priority;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:163:3:
> error: missing initializer for field ‘impl_opaque’ of ‘struct ’
> [-Werror=missing-field-initializers]
>    .ev.event_type =  RTE_EVENT_TYPE_TIMER,
>    ^
> In file included from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.h:14:0,
>  from
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test_perf_common.c:5:
> /home/arybchik/build/dpdk-next-net.hpdl160g9b.x86_64-native-linuxapp-gcc/include/rte_eventdev.h:1059:12:
> note: ‘impl_opaque’ declared here
>     uint8_t impl_opaque;
>     ^
> /home/arybchik/src/dpdk-next-net/app/test-eventdev/test

[dpdk-dev] [PATCH] app/eventdev: fix gcc 4.8 compilation errors

2018-04-16 Thread Pavan Nikhilesh
test_perf_common.c: In function ‘perf_event_timer_producer’:
test_perf_common.c:99:3: error: missing initializer for
 field ‘priority’ of ‘struct ’
 [-Werror=missing-field-initializers]
   .ev.sched_type = t->opt->sched_type_list[0],

Fixes: d008f20bce23 ("app/eventdev: add event timer adapter as a producer")

Reported-by: Andrew Rybchenko 
Signed-off-by: Pavan Nikhilesh 
---
 app/test-eventdev/test_perf_common.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/app/test-eventdev/test_perf_common.c 
b/app/test-eventdev/test_perf_common.c
index a74ab9a9e..f16791861 100644
--- a/app/test-eventdev/test_perf_common.c
+++ b/app/test-eventdev/test_perf_common.c
@@ -87,21 +87,21 @@ perf_event_timer_producer(void *arg)
struct rte_mempool *pool = t->pool;
struct perf_elt *m;
struct rte_event_timer_adapter **adptr = t->timer_adptr;
+   struct rte_event_timer tim;
uint64_t timeout_ticks = opt->expiry_nsec / opt->timer_tick_nsec;
 
+   memset(&tim, 0, sizeof(struct rte_event_timer));
timeout_ticks = opt->optm_timer_tick_nsec ?
(timeout_ticks * opt->timer_tick_nsec)
/ opt->optm_timer_tick_nsec : timeout_ticks;
timeout_ticks += timeout_ticks ? 0 : 1;
-   const struct rte_event_timer tim = {
-   .ev.op = RTE_EVENT_OP_NEW,
-   .ev.queue_id = p->queue_id,
-   .ev.sched_type = t->opt->sched_type_list[0],
-   .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
-   .ev.event_type =  RTE_EVENT_TYPE_TIMER,
-   .state = RTE_EVENT_TIMER_NOT_ARMED,
-   .timeout_ticks = timeout_ticks,
-   };
+   tim.ev.event_type =  RTE_EVENT_TYPE_TIMER;
+   tim.ev.op = RTE_EVENT_OP_NEW;
+   tim.ev.sched_type = t->opt->sched_type_list[0];
+   tim.ev.queue_id = p->queue_id;
+   tim.ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
+   tim.state = RTE_EVENT_TIMER_NOT_ARMED;
+   tim.timeout_ticks = timeout_ticks;
 
if (opt->verbose_level > 1)
printf("%s(): lcore %d\n", __func__, rte_lcore_id());
@@ -149,21 +149,21 @@ perf_event_timer_producer_burst(void *arg)
struct rte_mempool *pool = t->pool;
struct perf_elt *m[BURST_SIZE + 1] = {NULL};
struct rte_event_timer_adapter **adptr = t->timer_adptr;
+   struct rte_event_timer tim;
uint64_t timeout_ticks = opt->expiry_nsec / opt->timer_tick_nsec;
 
+   memset(&tim, 0, sizeof(struct rte_event_timer));
timeout_ticks = opt->optm_timer_tick_nsec ?
(timeout_ticks * opt->timer_tick_nsec)
/ opt->optm_timer_tick_nsec : timeout_ticks;
timeout_ticks += timeout_ticks ? 0 : 1;
-   const struct rte_event_timer tim = {
-   .ev.op = RTE_EVENT_OP_NEW,
-   .ev.queue_id = p->queue_id,
-   .ev.sched_type = t->opt->sched_type_list[0],
-   .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
-   .ev.event_type =  RTE_EVENT_TYPE_TIMER,
-   .state = RTE_EVENT_TIMER_NOT_ARMED,
-   .timeout_ticks = timeout_ticks,
-   };
+   tim.ev.event_type =  RTE_EVENT_TYPE_TIMER;
+   tim.ev.op = RTE_EVENT_OP_NEW;
+   tim.ev.sched_type = t->opt->sched_type_list[0];
+   tim.ev.queue_id = p->queue_id;
+   tim.ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
+   tim.state = RTE_EVENT_TIMER_NOT_ARMED;
+   tim.timeout_ticks = timeout_ticks;
 
if (opt->verbose_level > 1)
printf("%s(): lcore %d\n", __func__, rte_lcore_id());
-- 
2.17.0



[dpdk-dev] [PATCH v7 0/9] switching devices representation

2018-04-16 Thread Declan Doherty
This patchset follows on from the port rerpesentor patchsets and the
community discussion that resulted. It outlines the model for
representing and controlling switching capable devices in a new
programmer's guide entry based upon the excellent summary by 
Adrien Mazarguil in 
(http://dpdk.org/ml/archives/dev/2018-March/092513.html).

The next patches introduce changes to librte_ether to:
1, support the definition of a switch domain and make it public to
application through the rte_eth_dev_info structure.
2, Add generic ethdev create/destroy APIs to facilitate and generalise the
creation of ethdev's on different bus types.
3, Add ethdev attribute to dev_flags to specify that a port is a
representor port and make public through the rte_eth_dev_info
structure.
4, Add devargs parsing for generic eth_devargs to facilate parsing in
NET PMDs. This will be refactored to take account of the changes in 
(http://dpdk.org/ml/archives/dev/2018-March/092513.html)
5, Add new API to allocate switch domain ids to devices which support
this feature. 

This patchset also includes the enablement of vf port representor for ixgbe 
and i40e PF devices.

V7: 
This patch address the following changes:
 - fixes in documentation patch
 - changes the default value of switch domain id to be INVALID to allow
   applications to easily identify devices which can/cannot support the
   concept. Updates the switch information available through the
   rte_eth_dev_info structure.
 - remove the rte_ethdev_representor.h header and leave representor
   specific initialisation to driver
 - add new APIs for allocating and freeing switch domain identifier to
   enable PMDs to have unique switch domaind ids without the ethdev
   infrastructure placing any restriction on how theses are managed by
   devices.
 - bug fix in ethdev args parsing code.

Declan Doherty (8):
  doc: add switch representation documentation
  ethdev: add switch identifier parameter to port
  ethdev: add generic create/destroy ethdev APIs
  ethdev: Add port representor device flag
  app/testpmd: add port name to device info
  ethdev: add switch domain allocator
  net/i40e: add support for representor ports
  net/ixgbe: add support for representor ports

Remy Horton (1):
  ethdev: add common devargs parser

 app/test-pmd/config.c   |  15 +
 doc/guides/prog_guide/index.rst |   1 +
 doc/guides/prog_guide/switch_representation.rst | 837 
 drivers/net/i40e/Makefile   |   3 +
 drivers/net/i40e/i40e_ethdev.c  |  82 ++-
 drivers/net/i40e/i40e_ethdev.h  |  16 +
 drivers/net/i40e/i40e_vf_representor.c  | 405 
 drivers/net/i40e/meson.build|   4 +-
 drivers/net/i40e/rte_pmd_i40e.c |  43 ++
 drivers/net/i40e/rte_pmd_i40e.h |  18 +
 drivers/net/ixgbe/Makefile  |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c|  73 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h|  14 +
 drivers/net/ixgbe/ixgbe_pf.c|   7 +
 drivers/net/ixgbe/ixgbe_vf_representor.c| 217 ++
 drivers/net/ixgbe/meson.build   |   1 +
 lib/Makefile|   1 +
 lib/librte_ether/rte_ethdev.c   | 345 +-
 lib/librte_ether/rte_ethdev.h   |  26 +-
 lib/librte_ether/rte_ethdev_driver.h| 126 
 lib/librte_ether/rte_ethdev_pci.h   |  12 +
 lib/librte_ether/rte_ethdev_version.map |  12 +
 22 files changed, 2239 insertions(+), 20 deletions(-)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst
 create mode 100644 drivers/net/i40e/i40e_vf_representor.c
 create mode 100644 drivers/net/ixgbe/ixgbe_vf_representor.c

-- 
2.14.3



[dpdk-dev] [PATCH v7 1/9] doc: add switch representation documentation

2018-04-16 Thread Declan Doherty
Add document to describe the  model for representing switching capable
devices in DPDK, using a general ethdev port model and through port
representors. This document also details the port model and the
rte_flow semantics required for flow programming, as well as listing
some example use cases.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Declan Doherty 
---
 doc/guides/prog_guide/index.rst |   1 +
 doc/guides/prog_guide/switch_representation.rst | 837 
 2 files changed, 838 insertions(+)
 create mode 100644 doc/guides/prog_guide/switch_representation.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index bbbe7895d..09224af2e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -17,6 +17,7 @@ Programmer's Guide
 mbuf_lib
 poll_mode_drv
 rte_flow
+switch_representation
 traffic_metering_and_policing
 traffic_management
 bbdev
diff --git a/doc/guides/prog_guide/switch_representation.rst 
b/doc/guides/prog_guide/switch_representation.rst
new file mode 100644
index 0..8875d2846
--- /dev/null
+++ b/doc/guides/prog_guide/switch_representation.rst
@@ -0,0 +1,837 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2018 6WIND S.A.
+
+.. _switch_representation:
+
+Switch Representation within DPDK Applications
+==
+
+.. contents:: :local:
+
+Introduction
+
+
+Network adapters with multiple physical ports and/or SR-IOV capabilities
+usually support the offload of traffic steering rules between their virtual
+functions (VFs), physical functions (PFs) and ports.
+
+Like for standard Ethernet switches, this involves a combination of
+automatic MAC learning and manual configuration. For most purposes it is
+managed by the host system and fully transparent to users and applications.
+
+On the other hand, applications typically found on hypervisors that process
+layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
+according on their own criteria.
+
+Without a standard software interface to manage traffic steering rules
+between VFs, PFs and the various physical ports of a given device,
+applications cannot take advantage of these offloads; software processing is
+mandatory even for traffic which ends up re-injected into the device it
+originates from.
+
+This document describes how such steering rules can be configured through
+the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
+(PF/VF steering) using a single physical port for clarity, however the same
+logic applies to any number of ports without necessarily involving SR-IOV.
+
+Port Representors
+-
+
+In many cases, traffic steering rules cannot be determined in advance;
+applications usually have to process a bit of traffic in software before
+thinking about offloading specific flows to hardware.
+
+Applications therefore need the ability to receive and inject traffic to
+various device endpoints (other VFs, PFs or physical ports) before
+connecting them together. Device drivers must provide means to hook the
+"other end" of these endpoints and to refer them when configuring flow
+rules.
+
+This role is left to so-called "port representors" (also known as "VF
+representors" in the specific context of VFs), which are to DPDK what the
+Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
+which can be thought as a software "patch panel" front-end for applications.
+
+- DPDK port representors are implemented as additional virtual Ethernet
+  device (**ethdev**) instances, spawned on an as needed basis through
+  configuration parameters passed to the driver of the underlying
+  device using devargs.
+
+::
+
+   -w pci:dbdf,representor=0
+   -w pci:dbdf,representor=[0-3]
+   -w pci:dbdf,representor=[0,5-11]
+
+- As virtual devices, they may be more limited than their physical
+  counterparts, for instance by exposing only a subset of device
+  configuration callbacks and/or by not necessarily having Rx/Tx capability.
+
+- Among other things, they can be used to assign MAC addresses to the
+  resource they represent.
+
+- Applications can tell port representors apart from other physcial of virtual
+  port by checking the dev_flags field within their device information
+  structure for the RTE_ETH_DEV_REPRESENTOR bit-field.
+
+.. code-block:: c
+
+  struct rte_eth_dev_info {
+   ..
+   uint32_t dev_flags; /**< Device flags */
+   ..
+  };
+
+- The device or group relationship of ports can be discovered using the
+  switch ``domain_id`` field within the devices switch information structure. 
By
+  default the switch ``domain_id`` of a port will be
+  ``RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID`` to indicate that the port doesn't
+  support the concept of a switch domain, but ports which do support the 
concept
+  will be allocated a unique switch ``domain_id``, ports within the same sw

  1   2   3   >