date:20140528

[dpdk-dev] [PATCHv2 5/5] acl: add doxygen configuration and start page

2014-05-28 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 doc/doxy-api-index.md |3 ++-
 doc/doxy-api.conf |3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/doxy-api-index.md b/doc/doxy-api-index.md
index 2825c08..5e4cea9 100644
--- a/doc/doxy-api-index.md
+++ b/doc/doxy-api-index.md
@@ -78,7 +78,8 @@ There are many libraries, so their headers may be grouped by 
topics:
   [SCTP]   (@ref rte_sctp.h),
   [TCP](@ref rte_tcp.h),
   [UDP](@ref rte_udp.h),
-  [LPM route]  (@ref rte_lpm.h)
+  [LPM route]  (@ref rte_lpm.h),
+  [ACL](@ref rte_acl.h)

 - **QoS**:
   [metering]   (@ref rte_meter.h),
diff --git a/doc/doxy-api.conf b/doc/doxy-api.conf
index 642f77a..b1fc16a 100644
--- a/doc/doxy-api.conf
+++ b/doc/doxy-api.conf
@@ -44,7 +44,8 @@ INPUT   = doc/doxy-api-index.md \
   lib/librte_power \
   lib/librte_ring \
   lib/librte_sched \
-  lib/librte_timer
+  lib/librte_timer \
+  lib/librte_acl
 FILE_PATTERNS   = rte_*.h \
   cmdline.h
 PREDEFINED  = __DOXYGEN__ \
-- 
1.7.7.6

[dpdk-dev] [PATCHv2 4/5] acl: New sample l3fwd-acl.

2014-05-28 Thread Konstantin Ananyev

Demonstrates the use of the ACL library in the DPDK application to
implement packet classification and L3 forwarding.

Signed-off-by: Konstantin Ananyev 
---
 examples/Makefile   |1 +
 examples/l3fwd-acl/Makefile |   56 ++
 examples/l3fwd-acl/main.c   | 2048 +++
 examples/l3fwd-acl/main.h   |   45 +
 4 files changed, 2150 insertions(+), 0 deletions(-)
 create mode 100644 examples/l3fwd-acl/Makefile
 create mode 100644 examples/l3fwd-acl/main.c
 create mode 100644 examples/l3fwd-acl/main.h

diff --git a/examples/Makefile b/examples/Makefile
index d6b08c2..f3d1726 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -64,5 +64,6 @@ DIRS-y += vhost
 DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
 DIRS-y += vmdq
 DIRS-y += vmdq_dcb
+DIRS-$(CONFIG_RTE_LIBRTE_ACL) += l3fwd-acl

 include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/l3fwd-acl/Makefile b/examples/l3fwd-acl/Makefile
new file mode 100644
index 000..7ba7247
--- /dev/null
+++ b/examples/l3fwd-acl/Makefile
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-default-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = l3fwd-acl
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
new file mode 100644
index 000..782824a
--- /dev/null
+++ b/examples/l3fwd-acl/main.c
@@ -0,0 +1,2048 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED

[dpdk-dev] [PATCHv2 3/5] acl: New test-acl application.

2014-05-28 Thread Konstantin Ananyev

Introduce test-acl:
Usage example and main test application for the ACL library.
Provides IPv4/IPv6 5-tuple classification.

Signed-off-by: Konstantin Ananyev 
---
 app/Makefile  |1 +
 app/test-acl/Makefile |   45 +++
 app/test-acl/main.c   | 1029 +
 app/test-acl/main.h   |   50 +++
 4 files changed, 1125 insertions(+), 0 deletions(-)
 create mode 100644 app/test-acl/Makefile
 create mode 100644 app/test-acl/main.c
 create mode 100644 app/test-acl/main.h

diff --git a/app/Makefile b/app/Makefile
index 6267d7b..c398771 100644
--- a/app/Makefile
+++ b/app/Makefile
@@ -35,5 +35,6 @@ DIRS-$(CONFIG_RTE_APP_TEST) += test
 DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg
+DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl

 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/app/test-acl/Makefile b/app/test-acl/Makefile
new file mode 100644
index 000..00fa3b6
--- /dev/null
+++ b/app/test-acl/Makefile
@@ -0,0 +1,45 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+APP = testacl
+
+CFLAGS += $(WERROR_FLAGS)
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) := main.c
+
+# this application needs libraries first
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ACL) += lib
+
+
+include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/test-acl/main.c b/app/test-acl/main.c
new file mode 100644
index 000..78d9ae5
--- /dev/null
+++ b/app/test-acl/main.c
@@ -0,0 +1,1029 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#ifndef RTE_LIBRTE_ACL_STANDALONE
+

[dpdk-dev] [PATCHv2 2/5] acl: update UT to reflect latest changes in the librte_acl.

2014-05-28 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 app/test/test_acl.c |  128 ++-
 1 files changed, 85 insertions(+), 43 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 790cdf3..c171eac 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -96,47 +96,13 @@ bswap_test_data(struct ipv4_7tuple * data, int len, int 
to_be)
  * Test scalar and SSE ACL lookup.
  */
 static int
-test_classify(void)
+test_classify_run(struct rte_acl_ctx * acx)
 {
-   struct rte_acl_ctx * acx;
int ret, i;
uint32_t result, count;
-
uint32_t results[RTE_DIM(acl_test_data) * RTE_ACL_MAX_CATEGORIES];
-
const uint8_t * data[RTE_DIM(acl_test_data)];

-   const uint32_t layout[RTE_ACL_IPV4VLAN_NUM] = {
-   offsetof(struct ipv4_7tuple, proto),
-   offsetof(struct ipv4_7tuple, vlan),
-   offsetof(struct ipv4_7tuple, ip_src),
-   offsetof(struct ipv4_7tuple, ip_dst),
-   offsetof(struct ipv4_7tuple, port_src),
-   };
-
-   acx = rte_acl_create(_param);
-   if (acx == NULL) {
-   printf("Line %i: Error creating ACL context!\n", __LINE__);
-   return -1;
-   }
-
-   /* add rules to the context */
-   ret = rte_acl_ipv4vlan_add_rules(acx, acl_test_rules,
-   RTE_DIM(acl_test_rules));
-   if (ret != 0) {
-   printf("Line %i: Adding rules to ACL context failed!\n", 
__LINE__);
-   rte_acl_free(acx);
-   return -1;
-   }
-
-   /* try building the context */
-   ret = rte_acl_ipv4vlan_build(acx, layout, RTE_ACL_MAX_CATEGORIES);
-   if (ret != 0) {
-   printf("Line %i: Building ACL context failed!\n", __LINE__);
-   rte_acl_free(acx);
-   return -1;
-   }
-
/* swap all bytes in the data to network order */
bswap_test_data(acl_test_data, RTE_DIM(acl_test_data), 1);

@@ -213,21 +179,97 @@ test_classify(void)
}
}

-   /* free ACL context */
-   rte_acl_free(acx);
+   ret = 0;

+err:
/* swap data back to cpu order so that next time tests don't fail */
bswap_test_data(acl_test_data, RTE_DIM(acl_test_data), 0);
+   return (ret);
+}

-   return 0;
-err:
+static int
+test_classify_buid(struct rte_acl_ctx * acx)
+{
+   int ret;
+   const uint32_t layout[RTE_ACL_IPV4VLAN_NUM] = {
+   offsetof(struct ipv4_7tuple, proto),
+   offsetof(struct ipv4_7tuple, vlan),
+   offsetof(struct ipv4_7tuple, ip_src),
+   offsetof(struct ipv4_7tuple, ip_dst),
+   offsetof(struct ipv4_7tuple, port_src),
+   };

-   /* swap data back to cpu order so that next time tests don't fail */
-   bswap_test_data(acl_test_data, RTE_DIM(acl_test_data), 0);
+   /* add rules to the context */
+   ret = rte_acl_ipv4vlan_add_rules(acx, acl_test_rules,
+   RTE_DIM(acl_test_rules));
+   if (ret != 0) {
+   printf("Line %i: Adding rules to ACL context failed!\n",
+   __LINE__);
+   return (ret);
+   }

-   rte_acl_free(acx);
+   /* try building the context */
+   ret = rte_acl_ipv4vlan_build(acx, layout, RTE_ACL_MAX_CATEGORIES);
+   if (ret != 0) {
+   printf("Line %i: Building ACL context failed!\n", __LINE__);
+   return (ret);
+   }

-   return -1;
+   return (0);
+}
+
+#defineTEST_CLASSIFY_ITER  4
+
+/*
+ * Test scalar and SSE ACL lookup.
+ */
+static int
+test_classify(void)
+{
+   struct rte_acl_ctx * acx;
+   int i, ret;
+
+   acx = rte_acl_create(_param);
+   if (acx == NULL) {
+   printf("Line %i: Error creating ACL context!\n", __LINE__);
+   return -1;
+   }
+
+   ret = 0;
+   for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+   if ((i & 1) == 0)
+   rte_acl_reset(acx);
+   else
+   rte_acl_reset_rules(acx);
+
+   ret = test_classify_buid(acx);
+   if (ret != 0) {
+   printf("Line %i, iter: %d: "
+   "Adding rules to ACL context failed!\n",
+   __LINE__, i);
+   break;
+   }
+
+   ret = test_classify_run(acx);
+   if (ret != 0) {
+   printf("Line %i, iter: %d: %s failed!\n",
+   __LINE__, i, __func__);
+   break;
+   }
+
+   /* reset rules and make sure that classify still works ok. */
+   rte_acl_reset_rules(acx);
+   ret = test_classify_run(acx);
+   if (ret != 0) {
+

[dpdk-dev] [PATCHv2 1/5] acl: Add ACL library (librte_acl) into DPDK.

2014-05-28 Thread Konstantin Ananyev

The ACL library is used to perform an N-tuple search over a set of rules with
multiple categories and find the best match for each category.

Signed-off-by: Konstantin Ananyev 
---
 config/common_linuxapp   |6 +
 lib/librte_acl/Makefile  |   60 +
 lib/librte_acl/acl.h |  182 +++
 lib/librte_acl/acl_bld.c | 2001 ++
 lib/librte_acl/acl_gen.c |  473 
 lib/librte_acl/acl_run.c |  927 
 lib/librte_acl/acl_vect.h|  129 +++
 lib/librte_acl/rte_acl.c |  413 +++
 lib/librte_acl/rte_acl.h |  453 
 lib/librte_acl/rte_acl_osdep.h   |   92 ++
 lib/librte_acl/rte_acl_osdep_alone.h |  277 +
 lib/librte_acl/tb_mem.c  |  102 ++
 lib/librte_acl/tb_mem.h  |   73 ++
 13 files changed, 5188 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_acl/Makefile
 create mode 100644 lib/librte_acl/acl.h
 create mode 100644 lib/librte_acl/acl_bld.c
 create mode 100644 lib/librte_acl/acl_gen.c
 create mode 100644 lib/librte_acl/acl_run.c
 create mode 100644 lib/librte_acl/acl_vect.h
 create mode 100644 lib/librte_acl/rte_acl.c
 create mode 100644 lib/librte_acl/rte_acl.h
 create mode 100644 lib/librte_acl/rte_acl_osdep.h
 create mode 100644 lib/librte_acl/rte_acl_osdep_alone.h
 create mode 100644 lib/librte_acl/tb_mem.c
 create mode 100644 lib/librte_acl/tb_mem.h

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..fcfed6f 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -337,3 +337,9 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 #
 CONFIG_RTE_NIC_BYPASS=n

+# Compile librte_acl
+#
+CONFIG_RTE_LIBRTE_ACL=y
+CONFIG_RTE_LIBRTE_ACL_DEBUG=n
+CONFIG_RTE_LIBRTE_ACL_STANDALONE=n
+
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
new file mode 100644
index 000..4fe4593
--- /dev/null
+++ b/lib/librte_acl/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_acl.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += tb_mem.c
+
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += rte_acl.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_ACL)-include := rte_acl_osdep.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_ACL)-include += rte_acl.h
+
+ifeq ($(CONFIG_RTE_LIBRTE_ACL_STANDALONE),y)
+# standalone build
+SYMLINK-$(CONFIG_RTE_LIBRTE_ACL)-include += rte_acl_osdep_alone.h
+else
+# this lib needs eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ACL) += lib/librte_eal lib/librte_malloc
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
new file mode 100644
index 000..e6d7985
--- /dev/null
+++ b/lib/librte_acl/acl.h
@@ -0,0 +1,182 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must

[dpdk-dev] [PATCHv2 0/5] ACL library

2014-05-28 Thread Konstantin Ananyev

The ACL library is used to perform an N-tuple search over a set of rules
with multiple categories and find the best match (highest priority)
for each category.
This code was previously released under a proprietary license,
but is now being released under a BSD license to allow its
integration with the rest of the Intel DPDK codebase.

Note that these patch series require other patch:
"lpm: Introduce rte_lpm_lookupx4" be already installed.

This patch series contains the following items:
1) librte_acl.
2) UT changes reflect latest changes in rte_acl library.
3) teat-acl: usage example and main test application for the ACL library.
   Provides IPv4/IPv6 5-tuple classification.
4) l3fwd-acl: demonstrates the use of the ACL library in the DPDK application
   to implement packet classification and L3 forwarding.
5) add doxygen configuration and start page

v2 fixes:
* Fixed several checkpatch.pl issues
* Added doxygen related changes

 app/Makefile |1 +
 app/test-acl/Makefile|   45 +
 app/test-acl/main.c  | 1029 +
 app/test-acl/main.h  |   50 +
 app/test/test_acl.c  |  128 ++-
 config/common_linuxapp   |6 +
 doc/doxy-api-index.md|3 +-
 doc/doxy-api.conf|3 +-
 examples/Makefile|1 +
 examples/l3fwd-acl/Makefile  |   56 +
 examples/l3fwd-acl/main.c| 2048 ++
 examples/l3fwd-acl/main.h|   45 +
 lib/librte_acl/Makefile  |   60 +
 lib/librte_acl/acl.h |  182 +++
 lib/librte_acl/acl_bld.c | 2001 +
 lib/librte_acl/acl_gen.c |  473 
 lib/librte_acl/acl_run.c |  927 +++
 lib/librte_acl/acl_vect.h|  129 +++
 lib/librte_acl/rte_acl.c |  413 +++
 lib/librte_acl/rte_acl.h |  453 
 lib/librte_acl/rte_acl_osdep.h   |   92 ++
 lib/librte_acl/rte_acl_osdep_alone.h |  277 +
 lib/librte_acl/tb_mem.c  |  102 ++
 lib/librte_acl/tb_mem.h  |   73 ++
 24 files changed, 8552 insertions(+), 45 deletions(-)
 create mode 100644 app/test-acl/Makefile
 create mode 100644 app/test-acl/main.c
 create mode 100644 app/test-acl/main.h
 create mode 100644 examples/l3fwd-acl/Makefile
 create mode 100644 examples/l3fwd-acl/main.c
 create mode 100644 examples/l3fwd-acl/main.h
 create mode 100644 lib/librte_acl/Makefile
 create mode 100644 lib/librte_acl/acl.h
 create mode 100644 lib/librte_acl/acl_bld.c
 create mode 100644 lib/librte_acl/acl_gen.c
 create mode 100644 lib/librte_acl/acl_run.c
 create mode 100644 lib/librte_acl/acl_vect.h
 create mode 100644 lib/librte_acl/rte_acl.c
 create mode 100644 lib/librte_acl/rte_acl.h
 create mode 100644 lib/librte_acl/rte_acl_osdep.h
 create mode 100644 lib/librte_acl/rte_acl_osdep_alone.h
 create mode 100644 lib/librte_acl/tb_mem.c
 create mode 100644 lib/librte_acl/tb_mem.h

-- 
1.7.7.6

[dpdk-dev] [PATCH 13/13] examples: overhaul of ip_reassembly app

2014-05-28 Thread Anatoly Burakov

New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port

Signed-off-by: Anatoly Burakov 
---
 examples/ip_reassembly/Makefile |1 -
 examples/ip_reassembly/main.c   | 1344 +--
 2 files changed, 435 insertions(+), 910 deletions(-)

diff --git a/examples/ip_reassembly/Makefile b/examples/ip_reassembly/Makefile
index 3115b95..9c9e0fa 100644
--- a/examples/ip_reassembly/Makefile
+++ b/examples/ip_reassembly/Makefile
@@ -52,7 +52,6 @@ CFLAGS += $(WERROR_FLAGS)
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_main.o += -Wno-return-type
-CFLAGS_main.o += -DIPV4_FRAG_TBL_STAT
 endif

 include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 6c40d76..da3a0db 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -1,13 +1,13 @@
 /*-
  *   BSD LICENSE
- * 
+ *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
  *   All rights reserved.
- * 
+ *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
  *   are met:
- * 
+ *
  * * Redistributions of source code must retain the above copyright
  *   notice, this list of conditions and the following disclaimer.
  * * Redistributions in binary form must reproduce the above copyright
@@ -17,7 +17,7 @@
  * * Neither the name of Intel Corporation nor the names of its
  *   contributors may be used to endorse or promote products derived
  *   from this software without specific prior written permission.
- * 
+ *
  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -73,54 +74,29 @@
 #include 
 #include 
 #include 
-#include "main.h"
-
-#define APP_LOOKUP_EXACT_MATCH  0
-#define APP_LOOKUP_LPM  1
-#define DO_RFC_1812_CHECKS
-
-#ifndef APP_LOOKUP_METHOD
-#define APP_LOOKUP_METHOD APP_LOOKUP_LPM
-#endif
-
-#if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH)
-#include 
-#elif (APP_LOOKUP_METHOD == APP_LOOKUP_LPM)
 #include 
 #include 
-#else
-#error "APP_LOOKUP_METHOD set to incorrect value"
-#endif

-#define MAX_PKT_BURST 32
-
-#include "rte_ip_frag.h"
+#include 

-#ifndef IPv6_BYTES
-#define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\
-   "%02x%02x:%02x%02x:%02x%02x:%02x%02x"
-#define IPv6_BYTES(addr) \
-   addr[0],  addr[1], addr[2],  addr[3], \
-   addr[4],  addr[5], addr[6],  addr[7], \
-   addr[8],  addr[9], addr[10], addr[11],\
-   addr[12], addr[13],addr[14], addr[15]
-#endif
+#include "main.h"

+#define MAX_PKT_BURST 32

-#define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1

-#define MAX_PORTS  RTE_MAX_ETHPORTS
+#define RTE_LOGTYPE_IP_RSMBL RTE_LOGTYPE_USER1

 #define MAX_JUMBO_PKT_LEN  9600

-#define IPV6_ADDR_LEN 16
-
-#define MEMPOOL_CACHE_SIZE 256
-
 #defineBUF_SIZE2048
 #define MBUF_SIZE  \
(BUF_SIZE + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

+#define NB_MBUF 8192
+
+/* allow max jumbo frame 9.5 KB */
+#define JUMBO_FRAME_MAX_SIZE   0x2600
+
 #defineMAX_FLOW_NUMUINT16_MAX
 #defineMIN_FLOW_NUM1
 #defineDEF_FLOW_NUM0x1000
@@ -130,10 +106,10 @@
 #defineMIN_FLOW_TTL1
 #defineDEF_FLOW_TTLMS_PER_S

-#defineDEF_MBUF_NUM0x400
+#define MAX_FRAG_NUM RTE_LIBRTE_IP_FRAG_MAX_FRAG

 /* Should be power of two. */
-#defineIPV4_FRAG_TBL_BUCKET_ENTRIES2
+#defineIP_FRAG_TBL_BUCKET_ENTRIES  16

 static uint32_t max_flow_num = DEF_FLOW_NUM;
 static uint32_t max_flow_ttl = DEF_FLOW_TTL;
@@ -174,12 +150,33 @@ static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
 static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;

 /* ethernet addresses of ports */
-static struct ether_addr ports_eth_addr[MAX_PORTS];
+static struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS];
+
+#ifndef IPv4_BYTES
+#define IPv4_BYTES_FMT "%" PRIu8 ".%" PRIu8 ".%" PRIu8 ".%" PRIu8
+#define IPv4_BYTES(addr) \
+   (uint8_t) (((addr) >> 24) & 0xFF),\
+   (uint8_t) (((addr) >> 16) & 0xFF),\
+   (uint8_t) (((addr) >> 8) & 0xFF),\
+   (uint8_t) ((addr) & 0xFF)
+#endif
+
+#ifndef IPv6_BYTES
+#define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\
+   "%02x%02x:%02x%02x:%02x%02x:%02x%02x"
+#define IPv6_BYTES(addr) \
+   addr[0],  addr[1], addr[2],  addr[3], \
+   addr[4],  addr[5], addr[6],  addr[7], \
+   addr[8],  addr[9], addr[10], addr[11],\

[dpdk-dev] [PATCH 12/13] ip_frag: add support for IPv6 reassembly

2014-05-28 Thread Anatoly Burakov

Mostly a copy-paste of IPv4, with a few caveats.

Only supported packets are those in which fragment extension header is
just after the IPv6 header.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/Makefile  |   1 +
 lib/librte_ip_frag/ip_frag_common.h  |  25 +++-
 lib/librte_ip_frag/ip_frag_internal.c| 172 +---
 lib/librte_ip_frag/rte_ip_frag.h |  51 +++-
 lib/librte_ip_frag/rte_ipv4_reassembly.c |   4 +-
 lib/librte_ip_frag/rte_ipv6_reassembly.c | 218 +++
 6 files changed, 421 insertions(+), 50 deletions(-)
 create mode 100644 lib/librte_ip_frag/rte_ipv6_reassembly.c

diff --git a/lib/librte_ip_frag/Makefile b/lib/librte_ip_frag/Makefile
index 13a4f9f..29aa36f 100644
--- a/lib/librte_ip_frag/Makefile
+++ b/lib/librte_ip_frag/Makefile
@@ -41,6 +41,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv4_fragmentation.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv4_reassembly.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv6_fragmentation.c
+SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv6_reassembly.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ip_frag_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += ip_frag_internal.c

diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
index 3e588a0..ac5cd61 100644
--- a/lib/librte_ip_frag/ip_frag_common.h
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -51,9 +51,17 @@ if (!(exp))  {   
\
 #define RTE_IP_FRAG_ASSERT(exp)do { } while(0)
 #endif /* IP_FRAG_DEBUG */

+#define IPV4_KEYLEN 1
+#define IPV6_KEYLEN 4
+
 /* helper macros */
 #defineIP_FRAG_MBUF2DR(dr, mb) ((dr)->row[(dr)->cnt++] = (mb))

+#define IPv6_KEY_BYTES(key) \
+   (key)[0], (key)[1], (key)[2], (key)[3]
+#define IPv6_KEY_BYTES_FMT \
+   "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64
+
 /* internal functions declarations */
 struct rte_mbuf * ip_frag_process(struct rte_ip_frag_pkt *fp,
struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
@@ -69,6 +77,7 @@ struct rte_ip_frag_pkt * ip_frag_lookup(struct 
rte_ip_frag_tbl *tbl,

 /* these functions need to be declared here as ip_frag_process relies on them 
*/
 struct rte_mbuf * ipv4_frag_reassemble(const struct rte_ip_frag_pkt *fp);
+struct rte_mbuf * ipv6_frag_reassemble(const struct rte_ip_frag_pkt *fp);



@@ -80,8 +89,10 @@ struct rte_mbuf * ipv4_frag_reassemble(const struct 
rte_ip_frag_pkt *fp);
 static inline int
 ip_frag_key_is_empty(const struct ip_frag_key * key)
 {
-   if (key->src_dst != 0)
-   return 0;
+   uint32_t i;
+   for (i = 0; i < key->key_len; i++)
+   if (key->src_dst[i] != 0)
+   return 0;
return 1;
 }

@@ -89,14 +100,20 @@ ip_frag_key_is_empty(const struct ip_frag_key * key)
 static inline void
 ip_frag_key_invalidate(struct ip_frag_key * key)
 {
-   key->src_dst = 0;
+   uint32_t i;
+   for (i = 0; i < key->key_len; i++)
+   key->src_dst[i] = 0;
 }

 /* compare two keys */
 static inline int
 ip_frag_key_cmp(const struct ip_frag_key * k1, const struct ip_frag_key * k2)
 {
-   return k1->src_dst ^ k2->src_dst;
+   uint32_t i, val;
+   val = k1->id ^ k2->id;
+   for (i = 0; i < k1->key_len; i++)
+   val |= k1->src_dst[i] ^ k2->src_dst[i];
+   return val;
 }

 /*
diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index 2f5a4b8..5d35037 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -110,6 +110,35 @@ ipv4_frag_hash(const struct ip_frag_key *key, uint32_t 
*v1, uint32_t *v2)
*v2 = (v << 7) + (v >> 14);
 }

+static inline void
+ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
+{
+   uint32_t v;
+   const uint32_t *p;
+
+   p = (const uint32_t *) >src_dst;
+
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   v = rte_hash_crc_4byte(p[0], PRIME_VALUE);
+   v = rte_hash_crc_4byte(p[1], v);
+   v = rte_hash_crc_4byte(p[2], v);
+   v = rte_hash_crc_4byte(p[3], v);
+   v = rte_hash_crc_4byte(p[4], v);
+   v = rte_hash_crc_4byte(p[5], v);
+   v = rte_hash_crc_4byte(p[6], v);
+   v = rte_hash_crc_4byte(p[7], v);
+   v = rte_hash_crc_4byte(key->id, v);
+#else
+
+   v = rte_jhash_3words(p[0], p[1], p[2], PRIME_VALUE);
+   v = rte_jhash_3words(p[3], p[4], p[5], v);
+   v = rte_jhash_3words(p[6], p[7], key->id, v);
+#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+
+   *v1 =  v;
+   *v2 = (v << 7) + (v >> 14);
+}
+
 struct rte_mbuf *
 ip_frag_process(struct rte_ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
@@ -142,18 +171,32 @@ ip_frag_process(struct rte_ip_frag_pkt *fp, struct 
rte_ip_frag_death_row *dr,
if (idx >= sizeof

[dpdk-dev] [PATCH 11/13] example: overhaul of ip_fragmentation example app

2014-05-28 Thread Anatoly Burakov

New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port

Signed-off-by: Anatoly Burakov 
---
 examples/ip_fragmentation/main.c | 547 ---
 1 file changed, 403 insertions(+), 144 deletions(-)

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 7aff99b..2ce564c 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -69,23 +69,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 

-#include "rte_ip_frag.h"
-#include "main.h"
-
-/*
- * Default byte size for the IPv4 Maximum Transfer Unit (MTU).
- * This value includes the size of IPv4 header.
- */
-#defineIPV4_MTU_DEFAULTETHER_MTU
+#include 

-/*
- * Default payload in bytes for the IPv4 packet.
- */
-#defineIPV4_DEFAULT_PAYLOAD(IPV4_MTU_DEFAULT - sizeof(struct 
ipv4_hdr))
+#include "main.h"

-#define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
+#define RTE_LOGTYPE_IP_FRAG RTE_LOGTYPE_USER1

 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

@@ -95,9 +87,22 @@
 #defineROUNDUP_DIV(a, b)   (((a) + (b) - 1) / (b))

 /*
- * Max number of fragments per packet expected.
+ * Default byte size for the IPv6 Maximum Transfer Unit (MTU).
+ * This value includes the size of IPv6 header.
+ */
+#defineIPV4_MTU_DEFAULTETHER_MTU
+#defineIPV6_MTU_DEFAULTETHER_MTU
+
+/*
+ * Default payload in bytes for the IPv6 packet.
+ */
+#defineIPV4_DEFAULT_PAYLOAD(IPV4_MTU_DEFAULT - sizeof(struct 
ipv4_hdr))
+#defineIPV6_DEFAULT_PAYLOAD(IPV6_MTU_DEFAULT - sizeof(struct 
ipv6_hdr))
+
+/*
+ * Max number of fragments per packet expected - defined by config file.
  */
-#defineMAX_PACKET_FRAG ROUNDUP_DIV(JUMBO_FRAME_MAX_SIZE, 
IPV4_DEFAULT_PAYLOAD)
+#defineMAX_PACKET_FRAG RTE_LIBRTE_IP_FRAG_MAX_FRAG

 #define NB_MBUF   8192

@@ -136,8 +141,27 @@ static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;

 /* ethernet addresses of ports */
 static struct ether_addr ports_eth_addr[RTE_MAX_ETHPORTS];
-static struct ether_addr remote_eth_addr =
-   {{0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}};
+
+#ifndef IPv4_BYTES
+#define IPv4_BYTES_FMT "%" PRIu8 ".%" PRIu8 ".%" PRIu8 ".%" PRIu8
+#define IPv4_BYTES(addr) \
+   (uint8_t) (((addr) >> 24) & 0xFF),\
+   (uint8_t) (((addr) >> 16) & 0xFF),\
+   (uint8_t) (((addr) >> 8) & 0xFF),\
+   (uint8_t) ((addr) & 0xFF)
+#endif
+
+#ifndef IPv6_BYTES
+#define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\
+   "%02x%02x:%02x%02x:%02x%02x:%02x%02x"
+#define IPv6_BYTES(addr) \
+   addr[0],  addr[1], addr[2],  addr[3], \
+   addr[4],  addr[5], addr[6],  addr[7], \
+   addr[8],  addr[9], addr[10], addr[11],\
+   addr[12], addr[13],addr[14], addr[15]
+#endif
+
+#define IPV6_ADDR_LEN 16

 /* mask of enabled ports */
 static int enabled_port_mask = 0;
@@ -151,14 +175,21 @@ struct mbuf_table {
struct rte_mbuf *m_table[MBUF_TABLE_SIZE];
 };

+struct rx_queue {
+   struct rte_mempool * direct_pool;
+   struct rte_mempool * indirect_pool;
+   struct rte_lpm * lpm;
+   struct rte_lpm6 * lpm6;
+   uint8_t portid;
+};
+
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
uint16_t n_rx_queue;
-   uint8_t rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
+   struct rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];

@@ -167,7 +198,7 @@ static const struct rte_eth_conf port_conf = {
.max_rx_pkt_len = JUMBO_FRAME_MAX_SIZE,
.split_hdr_size = 0,
.header_split   = 0, /**< Header Split disabled */
-   .hw_ip_checksum = 0, /**< IP checksum offload disabled */
+   .hw_ip_checksum = 1, /**< IP checksum offload enabled */
.hw_vlan_filter = 0, /**< VLAN filtering disabled */
.jumbo_frame= 1, /**< Jumbo Frame Support enabled */
.hw_strip_crc   = 0, /**< CRC stripped by hardware */
@@ -195,27 +226,61 @@ static const struct rte_eth_txconf tx_conf = {
.tx_rs_thresh = 0, /* Use PMD default values */
 };

-struct rte_mempool *pool_direct = NULL, *pool_indirect = NULL;
-
-struct l3fwd_route {
+/*
+ * IPv4 forwarding table
+ */
+struct l3fwd_ipv4_route {
uint32_t ip;
uint8_t  depth;
uint8_t  if_out;
 };

-struct l3fwd_route l3fwd_route_array[] = {
-   {IPv4(100,10,0,0), 16, 2},
-   {IPv4(100,20,0,0), 16, 2},
-   {IPv4(100,30,0,0), 16, 0},
-   {IPv4(100,40,0,0), 16, 0},
+struct l3fwd_ipv4_route

[dpdk-dev] [PATCH 10/13] examples: renamed ipv4_frag example app to ip_fragmentation

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 examples/{ipv4_frag => ip_fragmentation}/Makefile | 2 +-
 examples/{ipv4_frag => ip_fragmentation}/main.c   | 0
 examples/{ipv4_frag => ip_fragmentation}/main.h   | 0
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename examples/{ipv4_frag => ip_fragmentation}/Makefile (99%)
 rename examples/{ipv4_frag => ip_fragmentation}/main.c (100%)
 rename examples/{ipv4_frag => ip_fragmentation}/main.h (100%)

diff --git a/examples/ipv4_frag/Makefile b/examples/ip_fragmentation/Makefile
similarity index 99%
rename from examples/ipv4_frag/Makefile
rename to examples/ip_fragmentation/Makefile
index 5fc4d9e..1482772 100644
--- a/examples/ipv4_frag/Makefile
+++ b/examples/ip_fragmentation/Makefile
@@ -44,7 +44,7 @@ $(error This application requires RTE_MBUF_SCATTER_GATHER to 
be enabled)
 endif

 # binary name
-APP = ipv4_frag
+APP = ip_fragmentation

 # all source are stored in SRCS-y
 SRCS-y := main.c
diff --git a/examples/ipv4_frag/main.c b/examples/ip_fragmentation/main.c
similarity index 100%
rename from examples/ipv4_frag/main.c
rename to examples/ip_fragmentation/main.c
diff --git a/examples/ipv4_frag/main.h b/examples/ip_fragmentation/main.h
similarity index 100%
rename from examples/ipv4_frag/main.h
rename to examples/ip_fragmentation/main.h
-- 
1.8.1.4

[dpdk-dev] [PATCH 09/13] ip_frag: added IPv6 fragmentation support

2014-05-28 Thread Anatoly Burakov

Mostly a copy-paste of IPv4.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/Makefile |   1 +
 lib/librte_ip_frag/rte_ip_frag.h|  27 
 lib/librte_ip_frag/rte_ipv6_fragmentation.c | 219 
 3 files changed, 247 insertions(+)
 create mode 100644 lib/librte_ip_frag/rte_ipv6_fragmentation.c

diff --git a/lib/librte_ip_frag/Makefile b/lib/librte_ip_frag/Makefile
index 022092d..13a4f9f 100644
--- a/lib/librte_ip_frag/Makefile
+++ b/lib/librte_ip_frag/Makefile
@@ -40,6 +40,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv4_fragmentation.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv4_reassembly.c
+SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv6_fragmentation.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ip_frag_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += ip_frag_internal.c

diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index ecae782..4a4b5c3 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -174,6 +174,33 @@ rte_ip_frag_table_destroy( struct rte_ip_frag_tbl *tbl)
 }

 /**
+ * This function implements the fragmentation of IPv6 packets.
+ *
+ * @param pkt_in
+ *   The input packet.
+ * @param pkts_out
+ *   Array storing the output fragments.
+ * @param mtu_size
+ *   Size in bytes of the Maximum Transfer Unit (MTU) for the outgoing IPv6
+ *   datagrams. This value includes the size of the IPv6 header.
+ * @param pool_direct
+ *   MBUF pool used for allocating direct buffers for the output fragments.
+ * @param pool_indirect
+ *   MBUF pool used for allocating indirect buffers for the output fragments.
+ * @return
+ *   Upon successful completion - number of output fragments placed
+ *   in the pkts_out array.
+ *   Otherwise - (-1) * .
+ */
+int32_t
+rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
+   struct rte_mbuf **pkts_out,
+   uint16_t nb_pkts_out,
+   uint16_t mtu_size,
+   struct rte_mempool *pool_direct,
+   struct rte_mempool *pool_indirect);
+
+/**
  * IPv4 fragmentation.
  *
  * This function implements the fragmentation of IPv4 packets.
diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
new file mode 100644
index 000..e8f137c
--- /dev/null
+++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
@@ -0,0 +1,219 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "rte_ip_frag.h"
+#include "ip_frag_common.h"
+
+/**
+ * @file
+ * RTE IPv6 Fragmentation
+ *
+ * Implementation of IPv6 fragmentation.
+ *
+ */
+
+/* Fragment Extension Header */
+#defineIPV6_HDR_MF_SHIFT   0
+#defineIPV6_HDR_FO_SHIFT   3
+#defineIPV6_HDR_MF_MASK(1 << IPV6_HDR_MF_SHIFT)
+#defineIPV6_HDR_FO_MASK((1 << 
IPV6_HDR_FO_SHIFT) - 1)
+
+static inline void
+__fill_ipv6hdr_frag(struct ipv6_hdr *dst,
+   const struct ipv6_hdr *src, uint16_t len, uint16_t fofs,
+   uint32_t mf)
+{
+   struct ipv6_extension_fragment *fh;
+
+   rte_memcpy(dst, src, sizeof(*dst));
+

[dpdk-dev] [PATCH 08/13] ip_frag: renamed ipv4 frag function

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 examples/ipv4_frag/main.c   | 2 +-
 lib/librte_ip_frag/rte_ip_frag.h| 2 +-
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/ipv4_frag/main.c b/examples/ipv4_frag/main.c
index 05a26b1..7aff99b 100644
--- a/examples/ipv4_frag/main.c
+++ b/examples/ipv4_frag/main.c
@@ -272,7 +272,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t port_in)
qconf->tx_mbufs[port_out].m_table[len] = m;
len2 = 1;
} else {
-   len2 = rte_ipv4_fragmentation(m,
+   len2 = rte_ipv4_fragment_packet(m,
>tx_mbufs[port_out].m_table[len],
(uint16_t)(MBUF_TABLE_SIZE - len),
IPV4_MTU_DEFAULT,
diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 327e1f1..ecae782 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -194,7 +194,7 @@ rte_ip_frag_table_destroy( struct rte_ip_frag_tbl *tbl)
  *   in the pkts_out array.
  *   Otherwise - (-1) * .
  */
-int32_t rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
+int32_t rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
struct rte_mbuf **pkts_out,
uint16_t nb_pkts_out, uint16_t mtu_size,
struct rte_mempool *pool_direct,
diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index 6e5feb6..7ec20cf 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -96,7 +96,7 @@ static inline void __free_fragments(struct rte_mbuf *mb[], 
uint32_t num)
  *   Otherwise - (-1) * .
  */
 int32_t
-rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
+rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
struct rte_mbuf **pkts_out,
uint16_t nb_pkts_out,
uint16_t mtu_size,
-- 
1.8.1.4

[dpdk-dev] [PATCH 07/13] ip_frag: refactored reassembly code and made it a proper library

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 config/common_bsdapp |   2 +
 config/common_linuxapp   |   2 +
 examples/ip_reassembly/main.c|  24 +-
 lib/librte_ip_frag/Makefile  |   6 +-
 lib/librte_ip_frag/ip_frag_common.h  | 134 +-
 lib/librte_ip_frag/ip_frag_internal.c| 337 
 lib/librte_ip_frag/ipv4_frag_tbl.h   | 400 -
 lib/librte_ip_frag/rte_ip_frag.h | 223 +++-
 lib/librte_ip_frag/rte_ip_frag_common.c  | 142 ++
 lib/librte_ip_frag/rte_ipv4_reassembly.c | 189 ++
 lib/librte_ip_frag/rte_ipv4_rsmbl.h  | 427 ---
 11 files changed, 1023 insertions(+), 863 deletions(-)
 create mode 100644 lib/librte_ip_frag/ip_frag_internal.c
 delete mode 100644 lib/librte_ip_frag/ipv4_frag_tbl.h
 create mode 100644 lib/librte_ip_frag/rte_ip_frag_common.c
 create mode 100644 lib/librte_ip_frag/rte_ipv4_reassembly.c
 delete mode 100644 lib/librte_ip_frag/rte_ipv4_rsmbl.h

diff --git a/config/common_bsdapp b/config/common_bsdapp
index d30802e..be56ca7 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -261,6 +261,8 @@ CONFIG_RTE_LIBRTE_NET=y
 # Compile librte_net
 #
 CONFIG_RTE_LIBRTE_IP_FRAG=y
+CONFIG_RTE_LIBRTE_IP_FRAG_DEBUG=n
+CONFIG_RTE_LIBRTE_IP_FRAG_MAX_FRAG=4

 #
 # Compile librte_meter
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 074d961..4d58496 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -288,6 +288,8 @@ CONFIG_RTE_LIBRTE_NET=y
 # Compile librte_net
 #
 CONFIG_RTE_LIBRTE_IP_FRAG=y
+CONFIG_RTE_LIBRTE_IP_FRAG_DEBUG=n
+CONFIG_RTE_LIBRTE_IP_FRAG_MAX_FRAG=4

 #
 # Compile librte_meter
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 23ec4be..6c40d76 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -94,7 +94,7 @@

 #define MAX_PKT_BURST 32

-#include "rte_ipv4_rsmbl.h"
+#include "rte_ip_frag.h"

 #ifndef IPv6_BYTES
 #define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\
@@ -407,9 +407,9 @@ struct lcore_conf {
 #else
lookup_struct_t * ipv6_lookup_struct;
 #endif
-   struct ip_frag_tbl *frag_tbl[MAX_RX_QUEUE_PER_LCORE];
+   struct rte_ip_frag_tbl *frag_tbl[MAX_RX_QUEUE_PER_LCORE];
struct rte_mempool *pool[MAX_RX_QUEUE_PER_LCORE];
-   struct ip_frag_death_row death_row;
+   struct rte_ip_frag_death_row death_row;
struct mbuf_table *tx_mbufs[MAX_PORTS];
struct tx_lcore_stat tx_stat;
 } __rte_cache_aligned;
@@ -645,7 +645,6 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
uint32_t queue,
struct ipv4_hdr *ipv4_hdr;
void *d_addr_bytes;
uint8_t dst_port;
-   uint16_t flag_offset, ip_flag, ip_ofs;

eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

@@ -665,16 +664,12 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
uint32_t queue,
++(ipv4_hdr->hdr_checksum);
 #endif

-   flag_offset = rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
-   ip_ofs = (uint16_t)(flag_offset & IPV4_HDR_OFFSET_MASK);
-   ip_flag = (uint16_t)(flag_offset & IPV4_HDR_MF_FLAG);
-
 /* if it is a fragmented packet, then try to reassemble. */
-   if (ip_flag != 0 || ip_ofs  != 0) {
+   if (rte_ipv4_frag_pkt_is_fragmented(ipv4_hdr)) {

struct rte_mbuf *mo;
-   struct ip_frag_tbl *tbl;
-   struct ip_frag_death_row *dr;
+   struct rte_ip_frag_tbl *tbl;
+   struct rte_ip_frag_death_row *dr;

tbl = qconf->frag_tbl[queue];
dr = >death_row;
@@ -684,8 +679,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
uint32_t queue,
m->pkt.vlan_macip.f.l3_len = sizeof(*ipv4_hdr);

/* process this fragment. */
-   if ((mo = rte_ipv4_reassemble_packet(tbl, dr, m, tms, 
ipv4_hdr,
-   ip_ofs, ip_flag)) == NULL) 
+   if ((mo = rte_ipv4_frag_reassemble_packet(tbl, dr, m, 
tms,
+   ipv4_hdr)) == NULL)
/* no packet to send out. */
return;

@@ -1469,7 +1464,8 @@ setup_queue_tbl(struct lcore_conf *qconf, uint32_t lcore, 
int socket,
 * Plus, each TX queue can hold up to  packets.
 */ 

-   nb_mbuf = 2 * RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) * MAX_FRAG_NUM;
+   nb_mbuf = 2 * RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) *
+   RTE_LIBRTE_IP_FRAG_MAX_FRAG;
nb_mbuf *= (port_conf.rxmode.max_rx_pkt_len + BUF_SIZE - 1) / BUF_SIZE;
nb_mbuf += RTE_TEST_RX_DESC_DEFAULT + RTE_TEST_TX_DESC_DEFAULT;

diff --git a/lib/librte_ip_frag/Makefile

[dpdk-dev] [PATCH 06/13] ip_frag: renaming structures in fragmentation table to be more generic

2014-05-28 Thread Anatoly Burakov

Technically, fragmentation table can work for both IPv4 and IPv6
packets, so we're renaming everything to be generic enough to make sense
in IPv6 context.

Signed-off-by: Anatoly Burakov 
---
 examples/ip_reassembly/main.c   |  16 ++---
 lib/librte_ip_frag/ip_frag_common.h |   2 +
 lib/librte_ip_frag/ipv4_frag_tbl.h  | 130 ++--
 lib/librte_ip_frag/rte_ipv4_rsmbl.h |  92 -
 4 files changed, 122 insertions(+), 118 deletions(-)

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 42ade5c..23ec4be 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -407,9 +407,9 @@ struct lcore_conf {
 #else
lookup_struct_t * ipv6_lookup_struct;
 #endif
-   struct ipv4_frag_tbl *frag_tbl[MAX_RX_QUEUE_PER_LCORE];
+   struct ip_frag_tbl *frag_tbl[MAX_RX_QUEUE_PER_LCORE];
struct rte_mempool *pool[MAX_RX_QUEUE_PER_LCORE];
-   struct ipv4_frag_death_row death_row;
+   struct ip_frag_death_row death_row;
struct mbuf_table *tx_mbufs[MAX_PORTS];
struct tx_lcore_stat tx_stat;
 } __rte_cache_aligned;
@@ -673,8 +673,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
uint32_t queue,
if (ip_flag != 0 || ip_ofs  != 0) {

struct rte_mbuf *mo;
-   struct ipv4_frag_tbl *tbl;
-   struct ipv4_frag_death_row *dr;
+   struct ip_frag_tbl *tbl;
+   struct ip_frag_death_row *dr;

tbl = qconf->frag_tbl[queue];
dr = >death_row;
@@ -684,7 +684,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
uint32_t queue,
m->pkt.vlan_macip.f.l3_len = sizeof(*ipv4_hdr);

/* process this fragment. */
-   if ((mo = ipv4_frag_mbuf(tbl, dr, m, tms, ipv4_hdr,
+   if ((mo = rte_ipv4_reassemble_packet(tbl, dr, m, tms, 
ipv4_hdr,
ip_ofs, ip_flag)) == NULL) 
/* no packet to send out. */
return;
@@ -822,7 +822,7 @@ main_loop(__attribute__((unused)) void *dummy)
i, qconf, cur_tsc);
}

-   ipv4_frag_free_death_row(>death_row,
+   rte_ip_frag_free_death_row(>death_row,
PREFETCH_OFFSET);
}
}
@@ -1456,7 +1456,7 @@ setup_queue_tbl(struct lcore_conf *qconf, uint32_t lcore, 
int socket,
frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S *
max_flow_ttl;

-   if ((qconf->frag_tbl[queue] = ipv4_frag_tbl_create(max_flow_num,
+   if ((qconf->frag_tbl[queue] = rte_ip_frag_table_create(max_flow_num,
IPV4_FRAG_TBL_BUCKET_ENTRIES, max_flow_num, frag_cycles,
socket)) == NULL)
rte_exit(EXIT_FAILURE, "ipv4_frag_tbl_create(%u) on "
@@ -1501,7 +1501,7 @@ queue_dump_stat(void)
"rxqueueid=%hhu frag tbl stat:\n",
lcore,  qconf->rx_queue_list[i].port_id,
qconf->rx_queue_list[i].queue_id);
-   ipv4_frag_tbl_dump_stat(stdout, qconf->frag_tbl[i]);
+   rte_ip_frag_table_statistics_dump(stdout, 
qconf->frag_tbl[i]);
fprintf(stdout, "TX bursts:\t%" PRIu64 "\n"
"TX packets _queued:\t%" PRIu64 "\n"
"TX packets dropped:\t%" PRIu64 "\n"
diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
index c9741c0..6d4706a 100644
--- a/lib/librte_ip_frag/ip_frag_common.h
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -34,6 +34,8 @@
 #ifndef _IP_FRAG_COMMON_H_
 #define _IP_FRAG_COMMON_H_

+#include "rte_ip_frag.h"
+
 /* Debug on/off */
 #ifdef RTE_IP_FRAG_DEBUG

diff --git a/lib/librte_ip_frag/ipv4_frag_tbl.h 
b/lib/librte_ip_frag/ipv4_frag_tbl.h
index 5487230..fa3291d 100644
--- a/lib/librte_ip_frag/ipv4_frag_tbl.h
+++ b/lib/librte_ip_frag/ipv4_frag_tbl.h
@@ -43,7 +43,7 @@
  */

 /*
- * The ipv4_frag_tbl is a simple hash table:
+ * The ip_frag_tbl is a simple hash table:
  * The basic idea is to use two hash functions and 
  * associativity. This provides 2 *  possible locations in
  * the hash table for each key. Sort of simplified Cuckoo hashing,
@@ -64,9 +64,9 @@

 #definePRIME_VALUE 0xeaad8405

-TAILQ_HEAD(ipv4_pkt_list, ipv4_frag_pkt);
+TAILQ_HEAD(ip_pkt_list, ip_frag_pkt);

-struct ipv4_frag_tbl_stat {
+struct ip_frag_tbl_stat {
uint64_t find_num;  /* total # of find/insert attempts. */
uint64_t add_num;   /* # of add ops. */
uint64_t del_num;   /* # of del ops. */
@@ -75,7 +75,7 @@ struct ipv4_frag_tbl_stat {

[dpdk-dev] [PATCH 05/13] ip_frag: removed unneeded check and macro

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index 46ed583..6e5feb6 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -45,11 +45,6 @@
 #include "rte_ip_frag.h"
 #include "ip_frag_common.h"

-/*
- * MAX number of fragments per packet allowed.
- */
-#defineIPV4_MAX_FRAGS_PER_PACKET   0x80
-
 /* Fragment Offset */
 #defineIPV4_HDR_DF_SHIFT   14
 #defineIPV4_HDR_MF_SHIFT   13
@@ -119,10 +114,6 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
/* Fragment size should be a multiply of 8. */
RTE_IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);

-   /* Fragment size should be a multiply of 8. */
-   RTE_IP_FRAG_ASSERT(IPV4_MAX_FRAGS_PER_PACKET * frag_size >=
-   (uint16_t)(pkt_in->pkt.pkt_len - sizeof(struct ipv4_hdr)));
-
in_hdr = (struct ipv4_hdr *) pkt_in->pkt.data;
flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);

-- 
1.8.1.4

[dpdk-dev] [PATCH 04/13] ip_frag: new internal common header

2014-05-28 Thread Anatoly Burakov

Moved out debug log macros into common, as reassembly code will later
need them as well.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_common.h | 52 +
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 20 ++-
 2 files changed, 55 insertions(+), 17 deletions(-)
 create mode 100644 lib/librte_ip_frag/ip_frag_common.h

diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
new file mode 100644
index 000..c9741c0
--- /dev/null
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -0,0 +1,52 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _IP_FRAG_COMMON_H_
+#define _IP_FRAG_COMMON_H_
+
+/* Debug on/off */
+#ifdef RTE_IP_FRAG_DEBUG
+
+#defineRTE_IP_FRAG_ASSERT(exp) \
+if (!(exp)){   \
+   rte_panic("function %s, line%d\tassert \"" #exp "\" failed\n",  \
+   __func__, __LINE__);\
+}
+
+#else /*RTE_IP_FRAG_DEBUG*/
+
+#define RTE_IP_FRAG_ASSERT(exp)do { } while (0)
+
+#endif /*RTE_IP_FRAG_DEBUG*/
+
+#endif
diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index 5f67417..46ed583 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -43,27 +43,13 @@
 #include 

 #include "rte_ip_frag.h"
+#include "ip_frag_common.h"

 /*
  * MAX number of fragments per packet allowed.
  */
 #defineIPV4_MAX_FRAGS_PER_PACKET   0x80

-/* Debug on/off */
-#ifdef RTE_IPV4_FRAG_DEBUG
-
-#defineRTE_IPV4_FRAG_ASSERT(exp)   
\
-if (!(exp)){   \
-   rte_panic("function %s, line%d\tassert \"" #exp "\" failed\n",  \
-   __func__, __LINE__);\
-}
-
-#else /*RTE_IPV4_FRAG_DEBUG*/
-
-#define RTE_IPV4_FRAG_ASSERT(exp)  do { } while (0)
-
-#endif /*RTE_IPV4_FRAG_DEBUG*/
-
 /* Fragment Offset */
 #defineIPV4_HDR_DF_SHIFT   14
 #defineIPV4_HDR_MF_SHIFT   13
@@ -131,10 +117,10 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));

/* Fragment size should be a multiply of 8. */
-   RTE_IPV4_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
+   RTE_IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);

/* Fragment size should be a multiply of 8. */
-   RTE_IPV4_FRAG_ASSERT(IPV4_MAX_FRAGS_PER_PACKET * frag_size >=
+   RTE_IP_FRAG_ASSERT(IPV4_MAX_FRAGS_PER_PACKET * frag_size >=
(uint16_t)(pkt_in->pkt.pkt_len - sizeof(struct ipv4_hdr)));

in_hdr = (struct ipv4_hdr *) pkt_in->pkt.data;
-- 
1.8.1.4

[dpdk-dev] [PATCH 03/13] Fixing issues reported by checkpatch

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index 2d33a7b..5f67417 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -60,7 +60,7 @@ if (!(exp))   {   
\

 #else /*RTE_IPV4_FRAG_DEBUG*/

-#define RTE_IPV4_FRAG_ASSERT(exp)  do { } while(0)
+#define RTE_IPV4_FRAG_ASSERT(exp)  do { } while (0)

 #endif /*RTE_IPV4_FRAG_DEBUG*/

@@ -135,19 +135,19 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,

/* Fragment size should be a multiply of 8. */
RTE_IPV4_FRAG_ASSERT(IPV4_MAX_FRAGS_PER_PACKET * frag_size >=
-   (uint16_t)(pkt_in->pkt.pkt_len - sizeof (struct ipv4_hdr)));
+   (uint16_t)(pkt_in->pkt.pkt_len - sizeof(struct ipv4_hdr)));

-   in_hdr = (struct ipv4_hdr*) pkt_in->pkt.data;
+   in_hdr = (struct ipv4_hdr *) pkt_in->pkt.data;
flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);

/* If Don't Fragment flag is set */
if (unlikely ((flag_offset & IPV4_HDR_DF_MASK) != 0))
-   return (-ENOTSUP);
+   return -ENOTSUP;

/* Check that pkts_out is big enough to hold all fragments */
-   if (unlikely (frag_size * nb_pkts_out <
+   if (unlikely(frag_size * nb_pkts_out <
(uint16_t)(pkt_in->pkt.pkt_len - sizeof (struct ipv4_hdr
-   return (-EINVAL);
+   return -EINVAL;

in_seg = pkt_in;
in_seg_data_pos = sizeof(struct ipv4_hdr);
@@ -164,7 +164,7 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
out_pkt = rte_pktmbuf_alloc(pool_direct);
if (unlikely(out_pkt == NULL)) {
__free_fragments(pkts_out, out_pkt_pos);
-   return (-ENOMEM);
+   return -ENOMEM;
}

/* Reserve space for the IP header that will be built later */
@@ -182,7 +182,7 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
if (unlikely(out_seg == NULL)) {
rte_pktmbuf_free(out_pkt);
__free_fragments(pkts_out, out_pkt_pos);
-   return (-ENOMEM);
+   return -ENOMEM;
}
out_seg_prev->pkt.next = out_seg;
out_seg_prev = out_seg;
@@ -201,18 +201,16 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
in_seg_data_pos += len;

/* Current output packet (i.e. fragment) done ? */
-   if (unlikely(out_pkt->pkt.pkt_len >= mtu_size)) {
+   if (unlikely(out_pkt->pkt.pkt_len >= mtu_size))
more_out_segs = 0;
-   }

/* Current input segment done ? */
if (unlikely(in_seg_data_pos == in_seg->pkt.data_len)) {
in_seg = in_seg->pkt.next;
in_seg_data_pos = 0;

-   if (unlikely(in_seg == NULL)) {
+   if (unlikely(in_seg == NULL))
more_in_segs = 0;
-   }
}
}

@@ -235,5 +233,5 @@ rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
out_pkt_pos ++;
}

-   return (out_pkt_pos);
+   return out_pkt_pos;
 }
-- 
1.8.1.4

[dpdk-dev] [PATCH 02/13] Refactored IPv4 fragmentation into a proper library

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 examples/ipv4_frag/main.c   |  11 ++
 lib/librte_ip_frag/Makefile |   9 ++
 lib/librte_ip_frag/rte_ip_frag.h| 186 +-
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 239 
 mk/rte.app.mk   |   4 +
 5 files changed, 267 insertions(+), 182 deletions(-)
 create mode 100644 lib/librte_ip_frag/rte_ipv4_fragmentation.c

diff --git a/examples/ipv4_frag/main.c b/examples/ipv4_frag/main.c
index 3c2c960..05a26b1 100644
--- a/examples/ipv4_frag/main.c
+++ b/examples/ipv4_frag/main.c
@@ -74,6 +74,17 @@
 #include "rte_ip_frag.h"
 #include "main.h"

+/*
+ * Default byte size for the IPv4 Maximum Transfer Unit (MTU).
+ * This value includes the size of IPv4 header.
+ */
+#defineIPV4_MTU_DEFAULTETHER_MTU
+
+/*
+ * Default payload in bytes for the IPv4 packet.
+ */
+#defineIPV4_DEFAULT_PAYLOAD(IPV4_MTU_DEFAULT - sizeof(struct 
ipv4_hdr))
+
 #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1

 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
diff --git a/lib/librte_ip_frag/Makefile b/lib/librte_ip_frag/Makefile
index 3054c1f..13a83b1 100644
--- a/lib/librte_ip_frag/Makefile
+++ b/lib/librte_ip_frag/Makefile
@@ -31,6 +31,15 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+# library name
+LIB = librte_ip_frag.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += rte_ipv4_fragmentation.c
+
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += rte_ip_frag.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += ipv4_frag_tbl.h
diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 84fa9c9..0cf3878 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -31,9 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#ifndef __INCLUDE_RTE_IPV4_FRAG_H__
-#define __INCLUDE_RTE_IPV4_FRAG_H__
-#include 
+#ifndef _RTE_IP_FRAG_H__
+#define _RTE_IP_FRAG_H__

 /**
  * @file
@@ -43,67 +42,6 @@
  *
  */

-/*
- * Default byte size for the IPv4 Maximum Transfer Unit (MTU).
- * This value includes the size of IPv4 header.
- */
-#defineIPV4_MTU_DEFAULTETHER_MTU
-
-/*
- * Default payload in bytes for the IPv4 packet.
- */
-#defineIPV4_DEFAULT_PAYLOAD(IPV4_MTU_DEFAULT - sizeof(struct 
ipv4_hdr))
-
-/*
- * MAX number of fragments per packet allowed.
- */
-#defineIPV4_MAX_FRAGS_PER_PACKET   0x80
-
-
-/* Debug on/off */
-#ifdef RTE_IPV4_FRAG_DEBUG
-
-#defineRTE_IPV4_FRAG_ASSERT(exp)   
\
-if (!(exp)){   \
-   rte_panic("function %s, line%d\tassert \"" #exp "\" failed\n",  \
-   __func__, __LINE__);\
-}
-
-#else /*RTE_IPV4_FRAG_DEBUG*/
-
-#define RTE_IPV4_FRAG_ASSERT(exp)  do { } while(0)
-
-#endif /*RTE_IPV4_FRAG_DEBUG*/
-
-/* Fragment Offset */
-#defineIPV4_HDR_DF_SHIFT   14
-#defineIPV4_HDR_MF_SHIFT   13
-#defineIPV4_HDR_FO_SHIFT   3
-
-#defineIPV4_HDR_DF_MASK(1 << IPV4_HDR_DF_SHIFT)
-#defineIPV4_HDR_MF_MASK(1 << IPV4_HDR_MF_SHIFT)
-
-#defineIPV4_HDR_FO_MASK((1 << 
IPV4_HDR_FO_SHIFT) - 1)
-
-static inline void __fill_ipv4hdr_frag(struct ipv4_hdr *dst,
-   const struct ipv4_hdr *src, uint16_t len, uint16_t fofs,
-   uint16_t dofs, uint32_t mf)
-{
-   rte_memcpy(dst, src, sizeof(*dst));
-   fofs = (uint16_t)(fofs + (dofs >> IPV4_HDR_FO_SHIFT));
-   fofs = (uint16_t)(fofs | mf << IPV4_HDR_MF_SHIFT);
-   dst->fragment_offset = rte_cpu_to_be_16(fofs);
-   dst->total_length = rte_cpu_to_be_16(len);
-   dst->hdr_checksum = 0;
-}
-
-static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num)
-{
-   uint32_t i;
-   for (i = 0; i != num; i++)
-   rte_pktmbuf_free(mb[i]);
-}
-
 /**
  * IPv4 fragmentation.
  *
@@ -125,127 +63,11 @@ static inline void __free_fragments(struct rte_mbuf 
*mb[], uint32_t num)
  *   in the pkts_out array.
  *   Otherwise - (-1) * .
  */
-static inline int32_t rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
+int32_t rte_ipv4_fragmentation(struct rte_mbuf *pkt_in,
struct rte_mbuf **pkts_out,
uint16_t nb_pkts_out,
uint16_t mtu_size,
struct rte_mempool *pool_direct,
-   struct rte_mempool *pool_indirect)
-{
-   struct rte_mbuf *in_seg = NULL;
-   struct ipv4_hdr *in_hdr;
-   uint32_t out_pkt_pos, in_seg_data_pos;
-   uint32_t more_in_segs;
-   uint16_t fragment_offset, flag_offset, frag_size;
-
-   frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));
-
-

[dpdk-dev] [PATCH 01/13] ip_frag: Moving fragmentation/reassembly headers into a separate library

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 config/common_bsdapp   |  5 +++
 config/common_linuxapp |  5 +++
 examples/ip_reassembly/main.c  |  2 +-
 examples/ipv4_frag/main.c  |  2 +-
 lib/Makefile   |  1 +
 lib/librte_ip_frag/Makefile| 42 ++
 .../librte_ip_frag}/ipv4_frag_tbl.h|  0
 .../librte_ip_frag/rte_ip_frag.h   |  0
 .../librte_ip_frag/rte_ipv4_rsmbl.h|  0
 9 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ip_frag/Makefile
 rename {examples/ip_reassembly => lib/librte_ip_frag}/ipv4_frag_tbl.h (100%)
 rename examples/ipv4_frag/rte_ipv4_frag.h => lib/librte_ip_frag/rte_ip_frag.h 
(100%)
 rename examples/ip_reassembly/ipv4_rsmbl.h => 
lib/librte_ip_frag/rte_ipv4_rsmbl.h (100%)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 2cc7b80..d30802e 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -258,6 +258,11 @@ CONFIG_RTE_MAX_LCORE_FREQS=64
 CONFIG_RTE_LIBRTE_NET=y

 #
+# Compile librte_net
+#
+CONFIG_RTE_LIBRTE_IP_FRAG=y
+
+#
 # Compile librte_meter
 #
 CONFIG_RTE_LIBRTE_METER=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..074d961 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -285,6 +285,11 @@ CONFIG_RTE_MAX_LCORE_FREQS=64
 CONFIG_RTE_LIBRTE_NET=y

 #
+# Compile librte_net
+#
+CONFIG_RTE_LIBRTE_IP_FRAG=y
+
+#
 # Compile librte_meter
 #
 CONFIG_RTE_LIBRTE_METER=y
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index bafa8d9..42ade5c 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -94,7 +94,7 @@

 #define MAX_PKT_BURST 32

-#include "ipv4_rsmbl.h"
+#include "rte_ipv4_rsmbl.h"

 #ifndef IPv6_BYTES
 #define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\
diff --git a/examples/ipv4_frag/main.c b/examples/ipv4_frag/main.c
index 329f2ce..3c2c960 100644
--- a/examples/ipv4_frag/main.c
+++ b/examples/ipv4_frag/main.c
@@ -71,7 +71,7 @@
 #include 
 #include 

-#include "rte_ipv4_frag.h"
+#include "rte_ip_frag.h"
 #include "main.h"

 #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1
diff --git a/lib/Makefile b/lib/Makefile
index b92b392..99f60d0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_KVARGS) += librte_kvargs
+DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_ip_frag/Makefile b/lib/librte_ip_frag/Makefile
new file mode 100644
index 000..3054c1f
--- /dev/null
+++ b/lib/librte_ip_frag/Makefile
@@ -0,0 +1,42 @@
+#   BSD LICENSE
+# 
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+# 
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+# 
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+# 
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += rte_ip_frag.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += ipv4_frag_tbl.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += rte_ipv4_rsmbl.h
+
+# this library depends on rte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_mempool

[dpdk-dev] [PATCH 00/13] * SUBJECT HERE *

2014-05-28 Thread Anatoly Burakov

*** BLURB HERE ***

Anatoly Burakov (13):
  ip_frag: Moving fragmentation/reassembly headers into a separate
library
  Refactored IPv4 fragmentation into a proper library
  Fixing issues reported by checkpatch
  ip_frag: new internal common header
  ip_frag: removed unneeded check and macro
  ip_frag: renaming structures in fragmentation table to be more generic
  ip_frag: refactored reassembly code and made it a proper library
  ip_frag: renamed ipv4 frag function
  ip_frag: added IPv6 fragmentation support
  examples: renamed ipv4_frag example app to ip_fragmentation
  example: overhaul of ip_fragmentation example app
  ip_frag: add support for IPv6 reassembly
  examples: overhaul of ip_reassembly app

 config/common_bsdapp   |7 +
 config/common_linuxapp |7 +
 examples/{ipv4_frag => ip_fragmentation}/Makefile  |2 +-
 examples/{ipv4_frag => ip_fragmentation}/main.c|  536 ++--
 examples/{ipv4_frag => ip_fragmentation}/main.h|0
 examples/ip_reassembly/Makefile|1 -
 examples/ip_reassembly/ipv4_frag_tbl.h |  400 --
 examples/ip_reassembly/ipv4_rsmbl.h|  425 --
 examples/ip_reassembly/main.c  | 1348 +++-
 lib/Makefile   |1 +
 lib/librte_ip_frag/Makefile|   55 +
 lib/librte_ip_frag/ip_frag_common.h|  193 +++
 lib/librte_ip_frag/ip_frag_internal.c  |  421 ++
 lib/librte_ip_frag/rte_ip_frag.h   |  344 +
 lib/librte_ip_frag/rte_ip_frag_common.c|  142 +++
 .../librte_ip_frag/rte_ipv4_fragmentation.c|   91 +-
 lib/librte_ip_frag/rte_ipv4_reassembly.c   |  191 +++
 lib/librte_ip_frag/rte_ipv6_fragmentation.c|  219 
 lib/librte_ip_frag/rte_ipv6_reassembly.c   |  218 
 mk/rte.app.mk  |4 +
 20 files changed, 2668 insertions(+), 1937 deletions(-)
 rename examples/{ipv4_frag => ip_fragmentation}/Makefile (99%)
 rename examples/{ipv4_frag => ip_fragmentation}/main.c (57%)
 rename examples/{ipv4_frag => ip_fragmentation}/main.h (100%)
 delete mode 100644 examples/ip_reassembly/ipv4_frag_tbl.h
 delete mode 100644 examples/ip_reassembly/ipv4_rsmbl.h
 create mode 100644 lib/librte_ip_frag/Makefile
 create mode 100644 lib/librte_ip_frag/ip_frag_common.h
 create mode 100644 lib/librte_ip_frag/ip_frag_internal.c
 create mode 100644 lib/librte_ip_frag/rte_ip_frag.h
 create mode 100644 lib/librte_ip_frag/rte_ip_frag_common.c
 rename examples/ipv4_frag/rte_ipv4_frag.h => 
lib/librte_ip_frag/rte_ipv4_fragmentation.c (80%)
 create mode 100644 lib/librte_ip_frag/rte_ipv4_reassembly.c
 create mode 100644 lib/librte_ip_frag/rte_ipv6_fragmentation.c
 create mode 100644 lib/librte_ip_frag/rte_ipv6_reassembly.c

-- 
1.8.1.4

[dpdk-dev] [PATCH 00/13] * SUBJECT HERE *

2014-05-28 Thread Burakov, Anatoly

Sorry, for some reason two cover letters were sent

> Subject: [PATCH 00/13] *** SUBJECT HERE ***
> 
> *** BLURB HERE ***

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH 1/4] Link Bonding Library

2014-05-28 Thread Shaw, Jeffrey B

Hi Declan,
I'm worried about one thing in "bond_ethdev_tx_broadcast()" related to freeing 
of the broadcasted packets.

> +static uint16_t
> +bond_ethdev_tx_broadcast(void *queue, struct rte_mbuf **bufs, uint16_t 
> nb_pkts)
> +{
> + struct bond_dev_private *internals;
> + struct bond_tx_queue *bd_tx_q;
> +
> + uint8_t num_of_slaves;
> + uint8_t slaves[RTE_MAX_ETHPORTS];
> +
> + uint16_t num_tx_total = 0;
> +
> + int i;
> +
> + bd_tx_q = (struct bond_tx_queue *)queue;
> + internals = bd_tx_q->dev_private;
> +
> + /* Copy slave list to protect against slave up/down changes during tx
> +  * bursting */
> + num_of_slaves = internals->active_slave_count;
> + memcpy(slaves, internals->active_slaves,
> + sizeof(internals->active_slaves[0]) * num_of_slaves);
> +
> + if (num_of_slaves < 1)
> + return 0;
> +
> +
> + for (i = 0; i < num_of_slaves; i++) {
> + num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> + bufs, nb_pkts);
> + }
> +
> + return num_tx_total;
> +}
> +

Transmitting the same buffers on all slaves will cause problems when the PMD 
frees the mbufs for previously transmitted packets.
So if you broadcast 1 packet on 4 slaves, each of the 4 slaves will 
(eventually) have to free the same mbuf.  Without updating the refcnt, you will 
end up incorrectly freeing the same mbuf 3 extra times.

Your test application does not catch this case since the max packets that are 
tested is 320, which is less than the size of the Tx descriptor rings (512).  
Try testing with more packets and you should see some latent segmentation 
faults.  You should also see this with testpmd, if you try broadcasting enough 
packets.


Thanks,
Jeff

[dpdk-dev] [PATCH 4/4] Add Link Bonding Library to Doxygen

2014-05-28 Thread declan.dohe...@intel.com

From: Declan Doherty 

Signed-off-by: Declan Doherty 
---
 doc/doxy-api-index.md | 1 +
 doc/doxy-api.conf | 1 +
 2 files changed, 2 insertions(+)

diff --git a/doc/doxy-api-index.md b/doc/doxy-api-index.md
index 2825c08..2206c68 100644
--- a/doc/doxy-api-index.md
+++ b/doc/doxy-api-index.md
@@ -36,6 +36,7 @@ API {#index}
 There are many libraries, so their headers may be grouped by topics:

 - **device**:
+  [bond]   (@ref rte_bond.h),
   [ethdev] (@ref rte_ethdev.h),
   [devargs](@ref rte_devargs.h),
   [KNI](@ref rte_kni.h),
diff --git a/doc/doxy-api.conf b/doc/doxy-api.conf
index 642f77a..a9c5b30 100644
--- a/doc/doxy-api.conf
+++ b/doc/doxy-api.conf
@@ -30,6 +30,7 @@

 PROJECT_NAME= DPDK
 INPUT   = doc/doxy-api-index.md \
+  lib/librte_bond \
   lib/librte_eal/common/include \
   lib/librte_ether \
   lib/librte_hash \
-- 
1.8.5.3

[dpdk-dev] [PATCH 3/4] Link bonding integration into testpmd

2014-05-28 Thread declan.dohe...@intel.com

From: Declan Doherty 

  Adding link bonding support to testpmd.
- Includes the ability to create new bonded devices.
- Add /remove bonding slave devices.
- Interogate bonded device stats/configuration
- Change bonding modes and select balance transmit polices

Signed-off-by: Declan Doherty 
---
 app/test-pmd/cmdline.c| 550 ++
 app/test-pmd/parameters.c |   4 +-
 app/test-pmd/testpmd.c|  28 ++-
 app/test-pmd/testpmd.h|   2 +
 4 files changed, 582 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0be28f6..7c7c9f3 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -84,6 +84,9 @@
 #include 
 #include 
 #include 
+#ifdef RTE_LIBRTE_BOND
+#include 
+#endif

 #include "testpmd.h"

@@ -393,6 +396,31 @@ static void cmd_help_long_parsed(void *parsed_result,
"   Show the bypass configuration for a bypass enabled 
NIC"
" using the lowest port on the NIC.\n\n"
 #endif
+#ifdef RTE_LIBRTE_BOND
+   "create bonded device (mode) (socket)\n"
+   "   Create a new bonded device with specific 
bonding mode and socket.\n\n"
+
+   "add bonding slave (slave_id) (port_id)\n"
+   "   Add a slave device to a bonded device.\n\n"
+
+   "remove bonding slave (slave_id) (port_id)\n"
+   "   Remove a slave device from a bonded device.\n\n"
+
+   "set bonding mode (value) (port_id)\n"
+   "   Set the bonding mode on a bonded device.\n\n"
+
+   "set bonding primary (slave_id) (port_id)\n"
+   "   Set the primary slave for a bonded device.\n\n"
+
+   "show bonding config (port_id)\n"
+   "   Show the bonding config for port_id.\n\n"
+
+   "set bonding mac_addr (port_id) (address)\n"
+   "   Set the MAC address of a bonded device.\n\n"
+
+   "set bonding xmit_balance_policy (port_id) 
(l2|l23|l34)\n"
+   "   Set the transmit balance policy for bonded 
device running in balance mode.\n\n"
+#endif

, list_pkt_forwarding_modes()
);
@@ -2849,6 +2877,518 @@ cmdline_parse_inst_t cmd_show_bypass_config = {
 };
 #endif

+#ifdef RTE_LIBRTE_BOND
+/* *** SET BONDING MODE *** */
+struct cmd_set_bonding_mode_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t mode;
+   uint8_t value;
+   uint8_t port_id;
+};
+
+static void cmd_set_bonding_mode_parsed(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_set_bonding_mode_result *res = parsed_result;
+   portid_t port_id = res->port_id;
+
+   /* Set the bonding mode for the relevant port. */
+   if (0 != rte_eth_bond_mode_set(port_id, res->value)) {
+   printf("\t Failed to set bonding mode for port = %d.\n", 
port_id);
+   }
+}
+
+cmdline_parse_token_string_t cmd_setbonding_mode_set =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   set, "set");
+cmdline_parse_token_string_t cmd_setbonding_mode_bonding =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   bonding, "bonding");
+cmdline_parse_token_string_t cmd_setbonding_mode_mode =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   mode, "mode");
+cmdline_parse_token_num_t cmd_setbonding_mode_value =
+TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_mode_result,
+   value, UINT8);
+cmdline_parse_token_num_t cmd_setbonding_mode_port =
+TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_mode_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_bonding_mode = { .f = cmd_set_bonding_mode_parsed,
+   .help_str = "set bonding mode (mode_value) (port_id): "
+   "Set the bonding mode for port_id", .data = 
NULL, .tokens = {
+   (void *) _setbonding_mode_set,
+   (void *) _setbonding_mode_bonding,
+   (void *) _setbonding_mode_mode,
+   (void *) _setbonding_mode_value,
+   (void *) _setbonding_mode_port,
+   NULL, }, };
+
+/* *** SET BALANCE XMIT POLICY *** */
+struct cmd_set_bonding_balance_xmit_policy_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t balance_xmit_policy;
+   uint8_t port_id;
+   cmdline_fixed_string_t policy;
+};
+
+static void cmd_set_bonding_balance_xmit_policy_parsed(void

[dpdk-dev] [PATCH 2/4] Link bonding unit tests

2014-05-28 Thread declan.dohe...@intel.com

From: Declan Doherty 

Link bonding unit tests, including code to generate packet bursts 
for testing rx and tx functionality of bonded device and a
 virtual/stubbed out ethdev for use as slave ethdev in testing


Signed-off-by: Declan Doherty 
---
 app/test/Makefile |3 +
 app/test/commands.c   |3 +
 app/test/packet_burst_generator.c |  276 +++
 app/test/packet_burst_generator.h |   85 +
 app/test/test.h   |1 +
 app/test/test_link_bonding.c  | 4007 +
 app/test/virtual_pmd.c|  580 ++
 app/test/virtual_pmd.h|   74 +
 8 files changed, 5029 insertions(+)
 create mode 100644 app/test/packet_burst_generator.c
 create mode 100644 app/test/packet_burst_generator.h
 create mode 100644 app/test/test_link_bonding.c
 create mode 100644 app/test/virtual_pmd.c
 create mode 100644 app/test/virtual_pmd.h

diff --git a/app/test/Makefile b/app/test/Makefile
index b49785e..ac55a11 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -56,6 +56,7 @@ SRCS-$(CONFIG_RTE_APP_TEST) += test_ring_perf.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_rwlock.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_timer.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_mempool.c
+SRCS-$(CONFIG_RTE_APP_TEST) += test_link_bonding.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_mempool_perf.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_mbuf.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_logs.c
@@ -94,6 +95,8 @@ SRCS-$(CONFIG_RTE_APP_TEST) += test_common.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_timer_perf.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_ivshmem.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_devargs.c
+SRCS-$(CONFIG_RTE_APP_TEST) += virtual_pmd.c
+SRCS-$(CONFIG_RTE_APP_TEST) += packet_burst_generator.c

 ifeq ($(CONFIG_RTE_APP_TEST),y)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_acl.c
diff --git a/app/test/commands.c b/app/test/commands.c
index efa8566..4d0ec3b 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -157,6 +157,8 @@ static void cmd_autotest_parsed(void *parsed_result,
ret = test_timer();
if (!strcmp(res->autotest, "timer_perf_autotest"))
ret = test_timer_perf();
+   if (!strcmp(res->autotest, "link_bonding_autotest"))
+   ret = test_link_bonding();
if (!strcmp(res->autotest, "mempool_autotest"))
ret = test_mempool();
if (!strcmp(res->autotest, "mempool_perf_autotest"))
@@ -221,6 +223,7 @@ cmdline_parse_token_string_t cmd_autotest_autotest =
"alarm_autotest#interrupt_autotest#"
"version_autotest#eal_fs_autotest#"
"cmdline_autotest#func_reentrancy_autotest#"
+   "link_bonding_autotest#"
"mempool_perf_autotest#hash_perf_autotest#"
"memcpy_perf_autotest#ring_perf_autotest#"
"red_autotest#meter_autotest#sched_autotest#"
diff --git a/app/test/packet_burst_generator.c 
b/app/test/packet_burst_generator.c
new file mode 100644
index 000..8838068
--- /dev/null
+++ b/app/test/packet_burst_generator.c
@@ -0,0 +1,276 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include "packet_burst_generator.h"
+
+#define UDP_SRC_PORT

[dpdk-dev] [PATCH 1/4] Link Bonding Library

2014-05-28 Thread declan.dohe...@intel.com

From: Declan Doherty 

Link Bonding Library (lib/librte_bond) initial release with support for
 Mode 0 - Round Robin
 Mode 1 - Active Backup
 Mode 2 - Balance -> Supports 3 transmit polices (layer 2, layer 2+3, layer
 Mode 3 - Broadcast

Signed-off-by: Declan Doherty 
---
 config/common_bsdapp   |5 +
 config/common_linuxapp |5 +
 lib/Makefile   |1 +
 lib/librte_bond/Makefile   |   28 +
 lib/librte_bond/rte_bond.c | 1679 
 lib/librte_bond/rte_bond.h |  228 ++
 mk/rte.app.mk  |5 +
 7 files changed, 1951 insertions(+)
 create mode 100644 lib/librte_bond/Makefile
 create mode 100644 lib/librte_bond/rte_bond.c
 create mode 100644 lib/librte_bond/rte_bond.h

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 2cc7b80..53ed8b9 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -187,6 +187,11 @@ CONFIG_RTE_PMD_RING_MAX_TX_RINGS=16
 CONFIG_RTE_LIBRTE_PMD_PCAP=y

 #
+# Compile link bonding library
+#
+CONFIG_RTE_LIBRTE_BOND=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..35b525a 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -211,6 +211,11 @@ CONFIG_RTE_PMD_RING_MAX_TX_RINGS=16
 CONFIG_RTE_LIBRTE_PMD_PCAP=n


+#
+# Compile link bonding library
+#
+CONFIG_RTE_LIBRTE_BOND=y
+
 CONFIG_RTE_LIBRTE_PMD_XENVIRT=n

 #
diff --git a/lib/Makefile b/lib/Makefile
index b92b392..9995ba8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_BOND) += librte_bond
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
diff --git a/lib/librte_bond/Makefile b/lib/librte_bond/Makefile
new file mode 100644
index 000..7514378
--- /dev/null
+++ b/lib/librte_bond/Makefile
@@ -0,0 +1,28 @@
+# 
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_bond.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_BOND) += rte_bond.c
+
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_bond.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BOND) += lib/librte_mbuf lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BOND) += lib/librte_malloc
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bond/rte_bond.c b/lib/librte_bond/rte_bond.c
new file mode 100644
index 000..35dff25
--- /dev/null
+++ b/lib/librte_bond/rte_bond.c
@@ -0,0 +1,1679 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "rte_bond.h"
+
+static const char *driver_name = "Link Bonding PMD";
+
+/** Port Queue Mapping Structure */
+struct bond_rx_queue {
+   int queue_id;

[dpdk-dev] [PATCH 0/4] Link Bonding Library

2014-05-28 Thread declan.dohe...@intel.com

From: Declan Doherty 

Initial release of Link Bonding Library (lib/librte_bond) with support for 
bonding modes :
 0 - Round Robin
 1 - Active Backup
 2 - Balance l2 / l23 / l34 
 3 - Broadcast

patches split:
 1 - library + makefile changes
 2 - Unit test suite, including code to generate packet bursts for
testing rx and tx functionality of bonded device and a
virtual/stubbed out ethdev for use as slave ethdev in testing
 3 - Link bonding integration into testpmd, including :
 - Includes the ability to  create new bonded devices.
 - Add /remove bonding slave devices. 
 - Interogate bonded device stats/configuration
 - Change bonding modes and select balance transmit polices
 4 - Add Link Bonding Library to Doxygen


 app/test-pmd/cmdline.c|  550 +
 app/test-pmd/parameters.c |4 +-
 app/test-pmd/testpmd.c|   28 +-
 app/test-pmd/testpmd.h|2 +
 app/test/Makefile |3 +
 app/test/commands.c   |3 +
 app/test/packet_burst_generator.c |  276 +++
 app/test/packet_burst_generator.h |   85 +
 app/test/test.h   |1 +
 app/test/test_link_bonding.c  | 4007 +
 app/test/virtual_pmd.c|  580 ++
 app/test/virtual_pmd.h|   74 +
 config/common_bsdapp  |5 +
 config/common_linuxapp|5 +
 doc/doxy-api-index.md |1 +
 doc/doxy-api.conf |1 +
 lib/Makefile  |1 +
 lib/librte_bond/Makefile  |   28 +
 lib/librte_bond/rte_bond.c| 1679 
 lib/librte_bond/rte_bond.h|  228 +++
 mk/rte.app.mk |5 +
 21 files changed, 7564 insertions(+), 2 deletions(-)
 create mode 100644 app/test/packet_burst_generator.c
 create mode 100644 app/test/packet_burst_generator.h
 create mode 100644 app/test/test_link_bonding.c
 create mode 100644 app/test/virtual_pmd.c
 create mode 100644 app/test/virtual_pmd.h
 create mode 100644 lib/librte_bond/Makefile
 create mode 100644 lib/librte_bond/rte_bond.c
 create mode 100644 lib/librte_bond/rte_bond.h

-- 
1.8.5.3

[dpdk-dev] Intel I350 fails to work with DPDK

2014-05-28 Thread sabu kurian

Hai bruce,

Thanks for the reply.

I even tried that before. Having a burst size of 64 or 128 simply fails.
The card would send out a few packets (some 400 packets of 74 byte size)
and then freeze. For my application... I'm trying to generate the peak
traffic possible with the link speed and the NIC.



On Wed, May 28, 2014 at 4:16 PM, Richardson, Bruce <
bruce.richardson at intel.com> wrote:

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of sabu kurian
> > Sent: Wednesday, May 28, 2014 10:42 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Intel I350 fails to work with DPDK
> >
> > I have asked a similar question before, no one replied though.
> >
> > I'm crafting my own packets in mbuf's (74 byte packets all) and sending
> it
> > using
> >
> > ret = rte_eth_tx_burst(port_ids[lcore_id], 0, m_pool,burst_size);
> >
> > When burst_size is 1, it does work. Work in the sense the NIC will
> continue
> > with sending packets, at a little over
> > 50 percent of the link rate. For 1000 Mbps link rate .The observed
> > transmit rate of the NIC is 580 Mbps (using Intel DPDK). But it should be
> > possible to achieve at least 900 Mbps transmit rate with Intel DPDK and
> > I350 on 1 Gbps link.
> >
> > Could someone help me out on this ?
> >
> > Thanks and regards
>
> Sending out a single packet at a time is going to have a very high
> overhead, as each call to tx_burst involves making PCI transactions (MMIO
> writes to the hardware ring pointer). To reduce this penalty you should
> look to send out the packets in bursts, thereby saving PCI bandwidth and
> splitting the cost of each MMIO write over multiple packets.
>
> Regards,
> /Bruce
>

[dpdk-dev] [PATCH v3 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Thomas Monjalon

2014-05-28 16:06, Ouyang Changchun:
> This patch v3 fixes some errors and warnings reported by checkpatch.pl,
> please ignore previous 2 patches: patch v1 and patch v2, only apply this v3
> patch for zero copy RX/TX in user space vhost.
> 
> This patch series support user space vhost zero copy. It removes packets
> copying between host and guest in RX/TX. And it introduces an extra ring to
> store the detached mbufs. At initialization stage all mbufs put into this
> ring; when one guest starts, vhost gets the available buffer address
> allocated by guest for RX and translates them into host space addresses,
> then attaches them to mbufs and puts the attached mbufs into mempool.
> 
> Queue starting and DMA refilling will get mbufs from mempool and use them to
> set the DMA addresses.
> 
> For TX, it gets the buffer addresses of available packets to be transmitted
> from guest and translates them to host space addresses, then attaches them
> to mbufs and puts them to TX queues. After TX finishes, it pulls mbufs out
> from mempool, detaches them and puts them back into the extra ring.
> 
> This patch series also implement queue start and stop functionality in IXGBE
> PMD; and enable hardware loopback for VMDQ mode in IXGBE PMD.
> 
> Ouyang Changchun (3):
>   Add API to support queue start and stop functionality for RX/TX.
>   Implement queue start and stop functionality in IXGBE PMD; Enable
> hardware loopback for VMDQ mode in IXGBE PMD.
>   Support user space vhost zero copy, it removes packets copying between
> host and guest in RX/TX.

Acked-by: Thomas Monjalon 

Applied for version 1.7.0.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v3 3/3] examples/vhost: Support user space vhost zero copy

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1 and v2, only need this patch v3 for us vhost 
zero copy.

This patch supports user space vhost zero copy. It removes packets copying 
between host and guest in RX/TX.
It introduces an extra ring to store the detached mbufs. At initialization 
stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address 
allocated by guest for RX and
translates them into host space addresses, then attaches them to mbufs and puts 
the attached mbufs into
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
 This patch passed L2 Forward , L3 Forward testing base on commit: 
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
 See test environment information as the following:
 Fedora 19 , Linux Kernel 3.9.0, GCC 4.8.2 X68_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 examples/vhost/main.c   | 1476 +--
 examples/vhost/virtio-net.c |  186 +-
 examples/vhost/virtio-net.h |   23 +-
 3 files changed, 1623 insertions(+), 62 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index b86d57d..e91 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"
 #include "virtio-net.h"
@@ -70,6 +71,16 @@
 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

 /*
+ * No frame data buffer allocated from host are required for zero copy
+ * implementation, guest will allocate the frame data buffer, and vhost
+ * directly use it.
+ */
+#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
+#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) \
+   + RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE_ZCP 0
+
+/*
  * RX and TX Prefetch, Host, and Write-back threshold values should be
  * carefully set for optimal performance. Consult the network
  * controller's datasheet and supporting DPDK documentation for guidance
@@ -108,6 +119,25 @@
 #define RTE_TEST_RX_DESC_DEFAULT 1024 
 #define RTE_TEST_TX_DESC_DEFAULT 512

+/*
+ * Need refine these 2 macros for legacy and DPDK based front end:
+ * Max vring avail descriptor/entries from guest - MAX_PKT_BURST
+ * And then adjust power 2.
+ */
+/*
+ * For legacy front end, 128 descriptors,
+ * half for virtio header, another half for mbuf.
+ */
+#define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
+#define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */
+
+/* Get first 4 bytes in mbuf headroom. */
+#define MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t *)((uint8_t *)(mbuf) \
+   + sizeof(struct rte_mbuf)))
+
+/* true if x is a power of 2 */
+#define POWEROF2(x) x)-1) & (x)) == 0)
+
 #define INVALID_PORT_ID 0xFF

 /* Max number of devices. Limited by vmdq. */
@@ -138,8 +168,42 @@ static uint32_t num_switching_cores = 0;
 static uint32_t num_queues = 0;
 uint32_t num_devices = 0;

+/*
+ * Enable zero copy, pkts buffer will directly dma to hw descriptor,
+ * disabled on default.
+ */
+static uint32_t zero_copy;
+
+/* number of descriptors to apply*/
+static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
+static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
+
+/* max ring descriptor, ixgbe, i40e, e1000 all are 4096. */
+#define MAX_RING_DESC 4096
+
+struct vpool {
+   struct rte_mempool *pool;
+   struct rte_ring *ring;
+   uint32_t buf_size;
+} vpool_array[MAX_QUEUES+MAX_QUEUES];
+
 /* Enable VM2VM communications. If this is disabled then the MAC address 
compare is skipped. */
-static uint32_t enable_vm2vm = 1;
+typedef enum {
+   VM2VM_DISABLED = 0,
+   VM2VM_SOFTWARE = 1,
+   VM2VM_HARDWARE = 2,
+   VM2VM_LAST
+} vm2vm_type;
+static vm2vm_type vm2vm_mode = VM2VM_SOFTWARE;
+
+/* The type of host physical address translated from guest physical address. */
+typedef enum {
+   PHYS_ADDR_CONTINUOUS = 0,
+   PHYS_ADDR_CROSS_SUBREG = 1,
+   PHYS_ADDR_INVALID = 2,
+   PHYS_ADDR_LAST
+} hpa_type;
+
 /* Enable stats. */
 static uint32_t enable_stats = 0;
 /* Enable retries on RX. */
@@ -159,7 +223,7 @@ static uint32_t dev_index = 0;
 extern uint64_t VHOST_FEATURES;

 /* Default configuration for rx and tx thresholds etc. */
-static const struct rte_eth_rxconf rx_conf_default = {
+static struct rte_eth_rxconf rx_conf_default = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
@@ -173,7 +237,7 @@ static const struct rte_eth_rxconf rx_conf_default = {
  * Controller and the

[dpdk-dev] [PATCH v3 2/3] ixgbe: Implement queue start and stop functionality in IXGBE PMD

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1 and v2, only need this patch v3 for the queue 
start and stop functionality.

This patch implements queue start and stop functionality in IXGBE PMD;
it also enable hardware loopback for VMDQ mode in IXGBE PMD.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
 This patch passed L2 Forward , L3 Forward testing base on commit: 
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
 See test environment information as the following:
 Fedora 19 , Linux Kernel 3.9.0, GCC 4.8.2 X68_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   8 ++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 239 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 4 files changed, 220 insertions(+), 37 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..3dcff78 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -260,6 +260,10 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.vlan_tpid_set= ixgbe_vlan_tpid_set,
.vlan_offload_set = ixgbe_vlan_offload_set,
.vlan_strip_queue_set = ixgbe_vlan_strip_queue_set,
+   .rx_queue_start   = ixgbe_dev_rx_queue_start,
+   .rx_queue_stop= ixgbe_dev_rx_queue_stop,
+   .tx_queue_start   = ixgbe_dev_tx_queue_start,
+   .tx_queue_stop= ixgbe_dev_tx_queue_stop,
.rx_queue_setup   = ixgbe_dev_rx_queue_setup,
.rx_queue_release = ixgbe_dev_rx_queue_release,
.rx_queue_count   = ixgbe_dev_rx_queue_count,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 9d7e93f..1471942 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -212,6 +212,14 @@ void ixgbe_dev_tx_init(struct rte_eth_dev *dev);

 void ixgbe_dev_rxtx_start(struct rte_eth_dev *dev);

+int ixgbe_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
+int ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
 int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 37d02aa..54ca010 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1588,7 +1588,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
  * descriptors should meet the following condition:
  *  (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
  */
-#define IXGBE_MIN_RING_DESC 64
+#define IXGBE_MIN_RING_DESC 32
 #define IXGBE_MAX_RING_DESC 4096

 /*
@@ -1836,6 +1836,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->port_id = dev->data->port_id;
txq->txq_flags = tx_conf->txq_flags;
txq->ops = _txq_ops;
+   txq->start_tx_per_q = tx_conf->start_tx_per_q;

/*
 * Modification to set VFTDT for virtual function if vf is detected
@@ -2078,6 +2079,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ?
0 : ETHER_CRC_LEN);
rxq->drop_en = rx_conf->rx_drop_en;
+   rxq->start_rx_per_q = rx_conf->start_rx_per_q;

/*
 * Allocate RX ring hardware descriptors. A memzone large enough to
@@ -3025,6 +3027,13 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)

}

+   /* PFDMA Tx General Switch Control Enables VMDQ loopback */
+   if (cfg->enable_loop_back) {
+   IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+   for (i = 0; i < RTE_IXGBE_VMTXSW_REGISTER_COUNT; i++)
+   IXGBE_WRITE_REG(hw, IXGBE_VMTXSW(i), UINT32_MAX);
+   }
+
IXGBE_WRITE_FLUSH(hw);
 }

@@ -3234,7 +3243,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
-   int ret;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3289,11 +3297,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];

-   /* Allocate buffers for descriptor rings */
-   ret = ixgbe_alloc_rx_queue_mbufs(rxq);
-   if (ret)
-   return ret;
-
/*
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
@@ -3500,10 +3503,8 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)
struct igb_rx_queue *rxq;

[dpdk-dev] [PATCH v3 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Ouyang Changchun

This patch v3 fixes some errors and warnings reported by checkpatch.pl,
please ignore previous 2 patches: patch v1 and patch v2, only apply this v3 
patch for
zero copy RX/TX in user space vhost.

This patch series support user space vhost zero copy. It removes packets 
copying between host and guest
in RX/TX. And it introduces an extra ring to store the detached mbufs. At 
initialization stage all mbufs
put into this ring; when one guest starts, vhost gets the available buffer 
address allocated by guest
for RX and translates them into host space addresses, then attaches them to 
mbufs and puts the attached
mbufs into mempool.

Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

This patch series also implement queue start and stop functionality in IXGBE 
PMD; and enable hardware
loopback for VMDQ mode in IXGBE PMD.

Ouyang Changchun (3):
  Add API to support queue start and stop functionality for RX/TX.
  Implement queue start and stop functionality in IXGBE PMD; Enable
hardware loopback for VMDQ mode in IXGBE PMD.
  Support user space vhost zero copy, it removes packets copying between
host and guest in RX/TX.

 examples/vhost/main.c| 1476 --
 examples/vhost/virtio-net.c  |  186 +++-
 examples/vhost/virtio-net.h  |   23 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c |2 +-
 lib/librte_ether/rte_ethdev.c|  104 +++
 lib/librte_ether/rte_ethdev.h|   80 ++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  |4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h  |8 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|  239 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|6 +
 10 files changed, 2028 insertions(+), 100 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH] fix for eth_pcap_tx() can cause mbuf corruption

2014-05-28 Thread Konstantin Ananyev

If pcap_sendpacket() fails, then eth_pcap_tx shouldn't silently free that
mbuf and continue.

Signed-off-by: Konstantin Ananyev 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index dc4670c..6f026ab 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -239,8 +239,9 @@ eth_pcap_tx(void *queue,
mbuf = bufs[i];
ret = pcap_sendpacket(tx_queue->pcap, (u_char*) mbuf->pkt.data,
mbuf->pkt.data_len);
-   if(likely(!ret))
-   num_tx++;
+   if (unlikely(ret != 0))
+   break;
+   num_tx++;
rte_pktmbuf_free(mbuf);
}

-- 
1.7.7.6

[dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio

2014-05-28 Thread Thomas Monjalon

2014-05-23 00:10, Antti Kantee:
> On 22/05/14 13:13, Thomas Monjalon wrote:
> > 2014-05-19 16:51, Anatoly Burakov:
> >> Note that since igb_uio no longer has a PCI ID list, it can now be
> >> bound to any device, not just those explicitly supported by DPDK. In
> >> other words, it now behaves similar to PCI stub, VFIO and other generic
> >> PCI drivers.
> > 
> > I wonder if we could replace igb_uio by uio_pci_generic?
> 
> I've been running plenty of the NetBSD kernel PCI drivers in Linux
> userspace on top of uio_pci_generic, including NICs supported by DPDK.
> The only real annoyance is that mainline uio_pci_generic doesn't support
> MSI.  A pseudo-annoyance is that uio_pci_generic turns interrupts off
> from the PCI config space each time after you read an interrupt, so they
> have to be reenabled after each one (and NetBSD kernel drivers tend to
> like using interrupts for everything).
> 
> The annoyance of vfio is iommus.  Yes, I want to make the tradeoff of
> possibly scribbling memory vs. not being able to do anything on the
> wrong system.
> 
> I'd like to see a generic Linux kernel PCI driver blob without
> annoyances, though not yet annoyed enough to do anything myself ;)

So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
If someone wants to work on it, it's possible to stage uio_pci_generic in 
dpdk.org in order to make it ready for kernel.org.

-- 
Thomas

[dpdk-dev] [PATCH v3 20/20] setup script: adding support for VFIO to setup.sh

2014-05-28 Thread Anatoly Burakov

Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov 
---
 tools/setup.sh | 156 +++--
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }

 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+   echo "Unloading any existing VFIO module"
+   /sbin/lsmod | grep -s vfio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod vfio-pci
+   sudo /sbin/rmmod vfio_iommu_type1
+   sudo /sbin/rmmod vfio
+   fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+   remove_vfio_module
+
+   VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+   echo "Loading VFIO module"
+   /sbin/lsmod | grep -s vfio_pci > /dev/null
+   if [ $? -ne 0 ] ; then
+   if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+   sudo /sbin/modprobe vfio-pci
+   fi
+   fi
+
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # check if /dev/vfio/vfio exists - that way we
+   # know we either loaded the module, or it was
+   # compiled into the kernel
+   if [ ! -e /dev/vfio/vfio ] ; then
+   echo "## ERROR: VFIO not found!"
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }

 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # make sure regular user can access everything inside /dev/vfio
+   echo "chmod /dev/vfio/*"
+   sudo /usr/bin/chmod 0666 /dev/vfio/*
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # since permissions are only to be set when running as
+   # regular user, we only check ulimit here
+   #
+   # warn if regular user is only allowed
+   # to memlock <64M of memory
+   MEMLOCK_AMNT=`ulimit -l`
+
+   if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+   MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+   echo ""
+   echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+   echo ""
+   echo "This is the maximum amount of memory you will be"
+   echo "able to use with DPDK and VFIO if run as current user."
+   echo -n "To change this, please adjust limits.conf memlock "
+   echo "limit for current user."
+
+   if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+   echo ""
+   echo "## WARNING: memlock limit is less than 64MB"
+   echo -n "## DPDK with VFIO may not be able to 
initialize "
+   echo "if run as current user."
+   fi
+   fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+   if /sbin/lsmod  | grep -q vfio_pci ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to VFIO driver: "
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && 
echo "OK"
+   else
+   echo "# Please load the 'vfio-pci' kernel module before 
querying or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
if  /sbin/lsmod  | grep -q igb_uio ; then 
${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert KNI module"
-   FUNC[2]="load_kni_module"
+   TEXT[2]="Insert VFIO module"
+   FUNC[2]="load_vfio_module"
+
+   TEXT[3]="Insert KNI module"
+   FUNC[3]="load_kni_module"

-   TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-   FUNC[3]="set_non_numa_pages"
+   TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+   FUNC[4]="set_non_numa_pages"

-   TEXT[4]="Setup hugepage mappings for NUMA systems"
-

[dpdk-dev] [PATCH v3 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind

2014-05-28 Thread Anatoly Burakov

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov 
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 -
 tools/setup.sh  | 16 +-
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]

 def usage():
 '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):

 def check_modules():
 '''Checks that igb_uio is loaded'''
+global dpdk_drivers

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
 fd.close()
-mod = "igb_uio"
+
+# list of supported modules
+mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]

 # first check if module is loaded
-found = False
 for line in loaded_mods:
-if line.startswith(mod):
-found = True
-break
-if not found:
-print "Error - module %s not loaded" %mod
+for mod in mods:
+if line.startswith(mod["Name"]):
+mod["Found"] = True
+# special case for vfio_pci (module is named vfio-pci,
+# but its .ko is named vfio_pci)
+elif line.replace("_", "-").startswith(mod["Name"]):
+mod["Found"] = True
+
+# check if we have at least one loaded module
+if True not in [mod["Found"] for mod in mods]:
+print "Error - no supported modules are loaded"
 sys.exit(1)

+# change DPDK driver list to only contain drivers that are loaded
+dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
 the pci addresses (domain:bus:slot.func). The values are themselves
 dictionaries - one for each NIC.'''
 global devices
+global dpdk_drivers

 # clear any old data
 devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():

 # add igb_uio to list of supporting modules if needed
 if "Module_str" in devices[d]:
-if "igb_uio" not in devices[d]["Module_str"]:
-devices[d]["Module_str"] = devices[d]["Module_str"] + 
",igb_uio"
+for driver in dpdk_drivers:
+if driver not in devices[d]["Module_str"]:
+devices[d]["Module_str"] = devices[d]["Module_str"] + 
",%s" % driver
 else:
-devices[d]["Module_str"] = "igb_uio"
+devices[d]["Module_str"] = ",".join(dpdk_drivers)

 # make sure the driver and module strings do not have any duplicates
 if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
 dev["Driver_str"] = "" # clear driver string

 # if we are binding to one of DPDK drivers, add PCI id's to that driver
-if driver == "igb_uio":
+if driver in dpdk_drivers:
 filename = "/sys/bus/pci/drivers/%s/new_id" % driver
 try:
 f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
 to the user what devices are bound to the igb_uio driver, the kernel driver
 or to no driver'''
+global dpdk_drivers
 kernel_drv = []
-uio_drv = []
+dpdk_drv = []
 no_drv = []
+
 # split our list of devices into the three categories above
 for d in devices.keys():
 if not has_driver(d):
 no_drv.append(devices[d])
 continue
-if devices[d]["Driver_str"] == "igb_uio":
-uio_drv.append(devices[d])
+if devices[d]["Driver_str"] in dpdk_drivers:
+dpdk_drv.append(devices[d])
 else:
 kernel_drv.append(devices[d])

 # print each category separately, so we can clearly see what's used by DPDK
-display_devices("Network devices using IGB_UIO driver", uio_drv, \
+display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
 "drv=%(Driver_str)s unused=%(Module_str)s")
 display_devices("Network devices using kernel driver", kernel_drv,
 "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s 
%(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..e0671b8 100755

[dpdk-dev] [PATCH v3 18/20] igb_uio: Removed PCI ID table from igb_uio

2014-05-28 Thread Anatoly Burakov

Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-
 tools/igb_uio_bind.py | 118 +++---
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include 
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)

 static struct pci_driver igbuio_pci_driver = {
.name = "igb_uio",
-   .id_table = igbuio_pci_ids,
+   .id_table = NULL,
.probe = igbuio_pci_probe,
.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []

 def usage():
 '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that the needed modules (igb_uio) is loaded, and then
-determine from the .ko file, what its supported device ids are'''
-global module_dev_ids
+'''Checks that igb_uio is loaded'''

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
 if not found:
 print "Error - module %s not loaded" %mod
 sys.exit(1)
-
-# now find the .ko and get list of supported vendor/dev-ids
-modpath = find_module(mod)
-if modpath is None:
-print "Cannot find module file %s" % (mod + ".ko")
-sys.exit(1)
-depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-for line in depmod_output:
-if not line.startswith("alias"):
-continue
-if not line.endswith(mod):
-continue
-lineparts = line.split()
-if not(lineparts[1].startswith("pci:")):
-continue;
-else:
-lineparts[1] = lineparts[1][4:]
-vendor = lineparts[1][:9]
-device = lineparts[1][9:18]
-if vendor.startswith("v") and device.startswith("d"):
-module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-'''return true if device is supported by igb_uio, false otherwise'''
-for dev in module_dev_ids:
-if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-dev["Device"] == devices[dev_id]["Device"]):
-return True
-return False

 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str"

[dpdk-dev] [PATCH v3 17/20] test app: adding unit tests for VFIO EAL command-line parameter

2014-05-28 Thread Anatoly Burakov

Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_eal_flags.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
const char *argv11[] = {prgname, "--file-prefix=virtaddr",
"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};

+   /* try running with --vfio-intr INTx flag */
+   const char *argv12[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+   /* try running with --vfio-intr MSI flag */
+   const char *argv13[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+   /* try running with --vfio-intr MSI-X flag */
+   const char *argv14[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+   /* try running with --vfio-intr invalid flag */
+   const char *argv15[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=invalid"};
+

if (launch_proc(argv0) == 0) {
printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
printf("Error - process did not run ok with --base-virtaddr 
parameter\n");
return -1;
}
+   if (launch_proc(argv12) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr INTx parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv13) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv14) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI-X parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv15) == 0) {
+   printf("Error - process run ok with "
+   "--vfio-intr invalid parameter\n");
+   return -1;
+   }
return 0;
 }
 #endif
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line

2014-05-28 Thread Anatoly Burakov

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 18a3e04..e87a2e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0"xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR"vfio-intr"

 #define RTE_EAL_BLACKLIST_SIZE 0x100

@@ -361,6 +362,7 @@ eal_usage(const char *prgname)
   "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
   "native RDTSC\n"
   "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+  "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO 
(intx|msix)\n"
   "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by 
hotplug)\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +581,28 @@ eal_parse_base_virtaddr(const char *arg)
return 0;
 }

+static int
+eal_parse_vfio_intr(const char *mode)
+{
+   unsigned i;
+   static struct {
+   const char *name;
+   enum rte_intr_mode value;
+   } map[] = {
+   { "legacy", RTE_INTR_MODE_LEGACY },
+   { "msi", RTE_INTR_MODE_MSI },
+   { "msix", RTE_INTR_MODE_MSIX },
+   };
+
+   for (i = 0; i < RTE_DIM(map); i++) {
+   if (!strcmp(mode, map[i].name)) {
+   internal_config.vfio_intr_mode = map[i].value;
+   return 0;
+   }
+   }
+   return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +657,7 @@ eal_parse_args(int argc, char **argv)
{OPT_PCI_BLACKLIST, 1, 0, 0},
{OPT_VDEV, 1, 0, 0},
{OPT_SYSLOG, 1, NULL, 0},
+   {OPT_VFIO_INTR, 1, NULL, 0},
{OPT_BASE_VIRTADDR, 1, 0, 0},
{OPT_XEN_DOM0, 0, 0, 0},
{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +854,14 @@ eal_parse_args(int argc, char **argv)
return -1;
}
}
+   else if (!strcmp(lgopts[option_index].name, 
OPT_VFIO_INTR)) {
+   if (eal_parse_vfio_intr(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameters 
for --"
+   OPT_VFIO_INTR "\n");
+   eal_usage(prgname);
+   return -1;
+   }
+   }
else if (!strcmp(lgopts[option_index].name, 
OPT_CREATE_UIO_DEV)) {
internal_config.create_uio_dev = 1;
}
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 14/20] pci: enable VFIO device binding

2014-05-28 Thread Anatoly Burakov

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
return -1;
 }

+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+   int ret, mapped = 0;
+
+   /* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+   if (pci_vfio_is_enabled()) {
+   if ((ret = pci_vfio_map_resource(dev)) == 0)
+   mapped = 1;
+   else if (ret < 0)
+   return ret;
+   }
+#endif
+   /* map resources for devices that use igb_uio */
+   if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+   return ret;
+
+   return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
+   int ret;
struct rte_pci_id *id_table;
-   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
}

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-   /* map resources for devices that use igb_uio */
-   if ((ret = pci_uio_map_resource(dev)) != 0)
+   if ((ret = pci_map_device(dev)) != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
+#ifdef VFIO_PRESENT
+   pci_vfio_enable();
+
+   if (pci_vfio_is_enabled()) {
+
+   /* if we are primary process, create a thread to communicate 
with
+* secondary processes. the thread will use a socket to wait for
+* requests from secondary process to send open file 
descriptors,
+* because VFIO does not allow multiple open descriptors on a 
group or
+* VFIO container.
+*/
+   if (internal_config.process_type == RTE_PROC_PRIMARY &&
+   pci_vfio_mp_sync_setup() < 0)
+   return -1;
+   }
+#endif
return 0;
 }
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 13/20] vfio: add multiprocess support.

2014-05-28 Thread Anatoly Burakov

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  79 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index cb87f8a..572d173 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
 }

 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
}

return vfio_container_fd;
+   } else {
+   /*
+* if we're in a secondary process, request container fd from 
the
+* primary process via our socket
+*/
+   int socket_fd;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) 
< 0) {
+   RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+   if (vfio_container_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   close(socket_fd);
+   return vfio_container_fd;
}

return -1;
 }

 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = 
vfio_group_fd;
return vfio_group_fd;
}
+   /* if we're in a secondary process, request group fd from the primary
+* process via our socket
+*/
+   else {
+   int socket_fd, ret;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) 
{
+

[dpdk-dev] [PATCH v3 12/20] vfio: create mapping code for VFIO

2014-05-28 Thread Anatoly Burakov

Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/eal.c  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 706 +
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 76d445f..cb87f8a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE

 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index de182e1..18a3e04 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
internal_config.force_sockets = 0;
internal_config.syslog_facility = LOG_DAEMON;
internal_config.xen_dom0_support = 0;
+   /* if set to NONE, interrupt mode is determined automatically */
+   internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *

[dpdk-dev] [PATCH v3 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c

2014-05-28 Thread Anatoly Burakov

eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 527fa2a..76d445f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif

 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h 
rte_dom0_common.h
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 10/20] interrupts: Add support for VFIO interrupts

2014-05-28 Thread Anatoly Burakov

Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 285 -
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 284 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..c430710 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -66,6 +66,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_vfio.h"

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)

@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
int uio_intr_count;  /* for uio device */
+#ifdef VFIO_PRESENT
+   uint64_t vfio_intr_count;/* for vfio device */
+#endif
uint64_t timerfd_num;/* for timerfd */
char charbuf[16];/* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;

+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   /* enable INTx */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) _set->data;
+   *fd_ptr = intr_handle->fd;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* unmask INTx after enabling */
+   memset(irq_set, 0, len);
+   len = sizeof(struct vfio_irq_set);
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+
+   len = sizeof(struct vfio_irq_set);
+
+   /* mask interrupts before disabling */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* disable INTx*/
+   memset(irq_set, 0, len);
+   irq_set->argsz = len;
+   irq_set->count = 0;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "Error disabling INTx interrupts for fd %d\n", 
intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+   int len, ret;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   struct vfio_irq_set *irq_set;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+   irq_set->start = 0;
+

[dpdk-dev] [PATCH v3 09/20] vfio: add VFIO header

2014-05-28 Thread Anatoly Burakov

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include 
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include 
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 08/20] vfio: add support for VFIO in Linuxapp targets

2014-05-28 Thread Anatoly Burakov

Add VFIO compilation option to all configs.

Signed-off-by: Anatoly Burakov 
---
 config/defconfig_i686-default-linuxapp-gcc   | 1 +
 config/defconfig_i686-default-linuxapp-icc   | 1 +
 config/defconfig_x86_64-default-linuxapp-gcc | 1 +
 config/defconfig_x86_64-default-linuxapp-icc | 1 +
 4 files changed, 4 insertions(+)

diff --git a/config/defconfig_i686-default-linuxapp-gcc 
b/config/defconfig_i686-default-linuxapp-gcc
index ea90f12..5410f57 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc 
b/config/defconfig_i686-default-linuxapp-icc
index ecfbf28..1cc 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc 
b/config/defconfig_x86_64-default-linuxapp-gcc
index fc69b80..5c682a5 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc 
b/config/defconfig_x86_64-default-linuxapp-icc
index 4ab45b3..b9bb7f6 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 07/20] igb_uio: Moved interrupt type out of igb_uio

2014-05-28 Thread Anatoly Burakov

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/Makefile |  1 +
 lib/librte_eal/common/include/rte_pci.h|  1 +
 .../common/include/rte_pci_dev_feature_defs.h  | 46 +
 .../common/include/rte_pci_dev_features.h  | 44 
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  | 48 +-
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 2f99bf4..7daf38c 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h 
rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_vdev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include 
 #include 
 #include 
+
 #include 

 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h 
b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+   RTE_INTR_MODE_NONE = 0,
+   RTE_INTR_MODE_LEGACY,
+   RTE_INTR_MODE_MSI,
+   RTE_INTR_MODE_MSIX,
+   RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h 
b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of

[dpdk-dev] [PATCH v3 06/20] igb_uio: make igb_uio compilation optional

2014-05-28 Thread Anatoly Burakov

Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation can be optional.

Signed-off-by: Anatoly Burakov 
---
 config/defconfig_i686-default-linuxapp-gcc   | 1 +
 config/defconfig_i686-default-linuxapp-icc   | 1 +
 config/defconfig_x86_64-default-linuxapp-gcc | 1 +
 config/defconfig_x86_64-default-linuxapp-icc | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 5 files changed, 6 insertions(+)

diff --git a/config/defconfig_i686-default-linuxapp-gcc 
b/config/defconfig_i686-default-linuxapp-gcc
index 14bd3d1..ea90f12 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc 
b/config/defconfig_i686-default-linuxapp-icc
index ec3386e..ecfbf28 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc 
b/config/defconfig_x86_64-default-linuxapp-gcc
index f11ffbf..fc69b80 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc 
b/config/defconfig_x86_64-default-linuxapp-icc
index 4eaca4c..4ab45b3 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING

2014-05-28 Thread Anatoly Burakov

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_pci.c | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c| 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c   | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 4 ++--
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
  struct rte_pci_device *dev);

 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */

@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
.name = "test_driver",
.devinit = my_driver_init,
.id_table = my_driver_id,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };

 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 0;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if (pci_uio_map_resource(dev) < 0)
return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };

-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if ((ret = pci_uio_map_resource(dev)) != 0)
return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 755e474..f3575d5 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -279,7 +279,7 @@ static struct eth_driver rte_em_pmd = {
{
.name = "rte_em_pmd",
.id_table = pci_id_em_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_em_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index c7b3926..b49db52 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -600,7 +600,7 @@ static struct eth_driver rte_igb_pmd = {
{
.name = "rte_igb_pmd",
.id_table = pci_id_igb_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igb_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
@@ -613,7 +613,7 @@ static struct eth_driver rte_igbvf_pmd = {
{
.name = "rte_igbvf_pmd",
.id_table = pci_id_igbvf_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igbvf_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c

[dpdk-dev] [PATCH v3 04/20] pci: distinguish between legitimate failures and non-fatal errors

2014-05-28 Thread Anatoly Burakov

Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_pci.c| 16 +---
 lib/librte_eal/linuxapp/eal/eal_pci.c |  7 ---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)

 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
rc = rte_eal_pci_probe_one_driver(dr, dev);
if (rc < 0)
/* negative value is an error */
-   break;
+   return -1;
if (rc > 0)
/* positive value means driver not found */
continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
;
return 0;
}
-   return -1;
+   return 1;
 }

 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
struct rte_pci_device *dev = NULL;
struct rte_devargs *devargs;
int probe_all = 0;
+   int ret = 0;

if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   pci_probe_all_drivers(dev);
+   ret = pci_probe_all_drivers(dev);
else if (devargs != NULL &&
-   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-   pci_probe_all_drivers(dev) < 0)
+   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+   ret = pci_probe_all_drivers(dev);
+   if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 628813b..0b779ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
struct rte_pci_id *id_table;
+   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -431,13 +432,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
if (dev->devargs != NULL &&
dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
-   return 0;
+   return 1;
}

if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
/* map resources for devices that use igb_uio */
-   if (pci_uio_map_resource(dev) < 0)
-   return -1;
+   if ((ret = pci_uio_map_resource(dev)) != 0)
+   return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ae4e716..426769b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-   return -1;
+   return 1;
 }

 static int
@@ -284,7 +284,7

[dpdk-dev] [PATCH v3 03/20] pci: fixing errors in a previous commit found by checkpatch

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 61f09cc..ae4e716 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -69,7 +69,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, ) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse offset of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping size */
@@ -77,7 +77,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, ) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse size of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping physical address */
@@ -85,20 +85,20 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, [i].phaddr) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse addr of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
RTE_LOG(ERR, EAL,
"%s(): offset/size exceed system max 
value\n", __func__);
-   return (-1);
+   return -1;
}

maps[i].offset = offset;
maps[i].size = size;
}

-   return (i);
+   return i;
 }

 static int
@@ -128,12 +128,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
(size_t) uio_res->maps[i].size) != 
uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL, "Cannot mmap device 
resource\n");
close(fd);
-   return (-1);
+   return -1;
}
/* fd is not needed in slave process, close it */
close(fd);
}
-   return (0);
+   return 0;
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -277,7 +277,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {

/* secondary processes - use already recorded details */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return (pci_uio_map_secondary(dev));
+   return pci_uio_map_secondary(dev);

/* find uio resource */
uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -299,7 +299,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
/* allocate the mapping details for secondary processes*/
if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", 
__func__);
-   return (-1);
+   return -1;
}

rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -310,7 +310,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
RTE_DIM(uio_res->maps));
if (nb_maps < 0) {
rte_free(uio_res);
-   return (nb_maps);
+   return nb_maps;
}

uio_res->nb_maps = nb_maps;
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 02/20] pci: move uio mapping code to a separate file

2014-05-28 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 403 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 403 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index b00e3ec..527fa2a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */

 #include 
-#include 
-#include 
 #include 
 #include 

@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"

 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct pci_map {
-   void *addr;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;

 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }

 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
void *mapaddr;

@@ -147,342 +123,6 @@ fail:
return NULL;
 }

-#define OFF_MAX  ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-   int i;
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   uint64_t offset, size;
-
-   for (i = 0; i != nb_maps; i++) {
- 
-   /* check if map directory exists */
-   rte_snprintf(dirname, sizeof(dirname), 
-   "%s/maps/map%u", devname, i);
- 
-   if (access(dirname, F_OK) != 0)
-   break;
- 
-   /* get mapping offset */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/offset", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse offset of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping size */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/size", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse size of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping physical address */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/addr", dirname);
-   if (pci_parse_sysfs_value(filename, [i].phaddr) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse addr of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-   RTE_LOG(ERR, EAL,
-   "%s(): offset/size exceed system max value\n",
-   __func__); 
-   return (-1);
-   }
-
-   maps[i].offset = offset;
-   maps[i].size = size;
-}
-   return (i);
-}
-
-static int

[dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs

2014-05-28 Thread Anatoly Burakov

Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 --
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 

-#include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 

 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct uio_map {
+struct pci_map {
void *addr;
uint64_t offset;
uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   int nb_maps;
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;

-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   close(fd);
if (mapaddr == MAP_FAILED ||
(requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
goto fail;
@@ -186,10 +148,10 @@ fail:
 }

 #define OFF_MAX  ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t 
nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-   size_t i;
+   int i;
char dirname[PATH_MAX];
char filename[PATH_MAX];
uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map 
maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
+   int fd, i;
+   struct mapped_pci_resource *uio_res;

-   TAILQ_FOREACH(uio_res, uio_res_list, next) {
+   TAILQ_FOREACH(uio_res, pci_res_list, next) {

/* skip this element if it doesn't match our PCI address */
if (memcmp(_res->pci_addr, >addr, sizeof(dev->addr)))
continue;

[dpdk-dev] [PATCH v2 3/3] testpmd: Add commands to test link up and down of PMD

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1, and just apply this patch v2.

This patch adds commands to test the functionality of setting link up and down 
of PMD in testpmd.

Signed-off-by: Ouyang Changchun 
---
 app/test-pmd/cmdline.c | 81 ++
 app/test-pmd/testpmd.c | 14 +
 app/test-pmd/testpmd.h |  2 ++
 3 files changed, 97 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b3824f9..29bf5b5 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3780,6 +3780,85 @@ cmdline_parse_inst_t cmd_start_tx_first = {
},
 };

+/* *** SET LINK UP *** */
+struct cmd_set_link_up_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t link_up;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_set_link_up_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, set, "set");
+cmdline_parse_token_string_t cmd_set_link_up_link_up =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, link_up,
+   "link-up");
+cmdline_parse_token_string_t cmd_set_link_up_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, port, "port");
+cmdline_parse_token_num_t cmd_set_link_up_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_link_up_result, port_id, UINT8);
+
+static void cmd_set_link_up_parsed(__attribute__((unused)) void *parsed_result,
+__attribute__((unused)) struct cmdline *cl,
+__attribute__((unused)) void *data)
+{
+   struct cmd_set_link_up_result *res = parsed_result;
+   dev_set_link_up(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_set_link_up = {
+   .f = cmd_set_link_up_parsed,
+   .data = NULL,
+   .help_str = "set link-up port (port id)",
+   .tokens = {
+   (void *)_set_link_up_set,
+   (void *)_set_link_up_link_up,
+   (void *)_set_link_up_port,
+   (void *)_set_link_up_port_id,
+   NULL,
+   },
+};
+
+/* *** SET LINK DOWN *** */
+struct cmd_set_link_down_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t link_down;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_set_link_down_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, set, "set");
+cmdline_parse_token_string_t cmd_set_link_down_link_down =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, link_down,
+   "link-down");
+cmdline_parse_token_string_t cmd_set_link_down_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, port, "port");
+cmdline_parse_token_num_t cmd_set_link_down_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_link_down_result, port_id, UINT8);
+
+static void cmd_set_link_down_parsed(
+   __attribute__((unused)) void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_set_link_down_result *res = parsed_result;
+   dev_set_link_down(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_set_link_down = {
+   .f = cmd_set_link_down_parsed,
+   .data = NULL,
+   .help_str = "set link-down port (port id)",
+   .tokens = {
+   (void *)_set_link_down_set,
+   (void *)_set_link_down_link_down,
+   (void *)_set_link_down_port,
+   (void *)_set_link_down_port_id,
+   NULL,
+   },
+};
+
 /* *** SHOW CFG *** */
 struct cmd_showcfg_result {
cmdline_fixed_string_t show;
@@ -5164,6 +5243,8 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)_showcfg,
(cmdline_parse_inst_t *)_start,
(cmdline_parse_inst_t *)_start_tx_first,
+   (cmdline_parse_inst_t *)_set_link_up,
+   (cmdline_parse_inst_t *)_set_link_down,
(cmdline_parse_inst_t *)_reset,
(cmdline_parse_inst_t *)_set_numbers,
(cmdline_parse_inst_t *)_set_txpkts,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index bc38305..8f20fda 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1208,6 +1208,20 @@ stop_packet_forwarding(void)
test_done = 1;
 }

+void
+dev_set_link_up(portid_t pid)
+{
+   if (rte_eth_dev_set_link_up((uint8_t)pid) < 0)
+   printf("\nSet link up fail.\n");
+}
+
+void
+dev_set_link_down(portid_t pid)
+{
+   if (rte_eth_dev_set_link_down((uint8_t)pid) < 0)
+   printf("\nSet link down fail.\n");
+}
+
 static int
 all_ports_started(void)
 {
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2bdb1a2..88a29e9 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -499,6 +499,8 @@ char *list_pkt_forwarding_modes(void);
 void set_pkt_forwarding_mode(const

[dpdk-dev] [PATCH v2 2/3] ixgbe: Implement the functionality of setting link up and down in IXGBE PMD

2014-05-28 Thread Ouyang Changchun

Please ignore the previous v1 patch, just apply this v2 patch.

This patch implements the functionality of setting link up and down in IXGBE 
PMD.
It is implemented by enabling or disabling TX laser.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 63 +
 1 file changed, 63 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..8f9c97a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -97,6 +97,8 @@ static int eth_ixgbe_dev_init(struct eth_driver *eth_drv,
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
 static void ixgbe_dev_stop(struct rte_eth_dev *dev);
+static int  ixgbe_dev_set_link_up(struct rte_eth_dev *dev);
+static int  ixgbe_dev_set_link_down(struct rte_eth_dev *dev);
 static void ixgbe_dev_close(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -246,6 +248,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.dev_configure= ixgbe_dev_configure,
.dev_start= ixgbe_dev_start,
.dev_stop = ixgbe_dev_stop,
+   .dev_set_link_up= ixgbe_dev_set_link_up,
+   .dev_set_link_down  = ixgbe_dev_set_link_down,
.dev_close= ixgbe_dev_close,
.promiscuous_enable   = ixgbe_dev_promiscuous_enable,
.promiscuous_disable  = ixgbe_dev_promiscuous_disable,
@@ -1458,6 +1462,65 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 }

 /*
+ * Set device link up: enable tx laser.
+ */
+static int
+ixgbe_dev_set_link_up(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR,
+   "\nSet link up is not supported "
+   "by device id 0x%x\n",
+   hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn on the laser */
+   ixgbe_enable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR, "\nSet link up is not supported by device id 0x%x\n",
+   hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
+ * Set device link down: disable tx laser.
+ */
+static int
+ixgbe_dev_set_link_down(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR,
+   "\nSet link down is not supported "
+   "by device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn off the laser */
+   ixgbe_disable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR,
+   "\nSet link down is not supported by device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
  * Reest and stop device.
  */
 static void
-- 
1.9.0

[dpdk-dev] [PATCH v2 1/3] ether: Add API to support set link up and link down

2014-05-28 Thread Ouyang Changchun

Please ignore previous v1 patch, just use this v2 patch.

This patch adds API to support the functionality of setting link up and down.
It can be used to repeatedly stop and restart RX/TX of a port without 
re-allocating
resources for the port and re-configuring the port.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_ether/rte_ethdev.c | 38 ++
 lib/librte_ether/rte_ethdev.h | 34 ++
 2 files changed, 72 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..97e3f9d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -691,6 +691,44 @@ rte_eth_dev_stop(uint8_t port_id)
(*dev->dev_ops->dev_stop)(dev);
 }

+int
+rte_eth_dev_set_link_up(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_up, -ENOTSUP);
+   return (*dev->dev_ops->dev_set_link_up)(dev);
+}
+
+int
+rte_eth_dev_set_link_down(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_down, -ENOTSUP);
+   return (*dev->dev_ops->dev_set_link_down)(dev);
+}
+
 void
 rte_eth_dev_close(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d5ea46b..84f2e9f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -823,6 +823,12 @@ typedef int  (*eth_dev_start_t)(struct rte_eth_dev *dev);
 typedef void (*eth_dev_stop_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to stop a configured Ethernet device. */

+typedef int  (*eth_dev_set_link_up_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link up a configured Ethernet device. */
+
+typedef int  (*eth_dev_set_link_down_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link down a configured Ethernet device. */
+
 typedef void (*eth_dev_close_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to close a configured Ethernet device. */

@@ -1084,6 +1090,8 @@ struct eth_dev_ops {
eth_dev_configure_tdev_configure; /**< Configure device. */
eth_dev_start_tdev_start; /**< Start device. */
eth_dev_stop_t dev_stop;  /**< Stop device. */
+   eth_dev_set_link_up_t  dev_set_link_up;   /**< Device link up. */
+   eth_dev_set_link_down_tdev_set_link_down; /**< Device link down. */
eth_dev_close_tdev_close; /**< Close device. */
eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
@@ -1475,6 +1483,32 @@ extern int rte_eth_dev_start(uint8_t port_id);
  */
 extern void rte_eth_dev_stop(uint8_t port_id);

+
+/**
+ * Link up an Ethernet device.
+ *
+ * Set device link up will re-enable the device rx/tx
+ * functionality after it is previously set device linked down.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - 0: Success, Ethernet device linked up.
+ *   - <0: Error code of the driver device link up function.
+ */
+extern int rte_eth_dev_set_link_up(uint8_t port_id);
+
+/**
+ * Link down an Ethernet device.
+ * The device rx/tx functionality will be disabled if success,
+ * and it can be re-enabled with a call to
+ * rte_eth_dev_set_link_up()
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ */
+extern int rte_eth_dev_set_link_down(uint8_t port_id);
+
 /**
  * Close an Ethernet device. The device cannot be restarted!
  *
-- 
1.9.0

[dpdk-dev] [PATCH v2 0/3] Support setting link up and link down

2014-05-28 Thread Ouyang Changchun

Please ignore the previous patch series with subject: "Support administrative 
link up and link down"
This v2 patch series will replace the previous patch series.  

This patch series contain the following 3 items:
1. Add API to support setting link up and down, it can be used to repeatedly 
stop and restart
RX/TX of a port without re-allocating resources for the port and re-configuring 
the port.
2. Implement the functionality of setting link up and down in IXGBE PMD.
3. Add command in testpmd to test the functionality of setting link up and down 
of PMD.

Ouyang Changchun (3):
  Add API to support set link up and link down.
  Implement the functionality of setting link up and link down in IXGBE
PMD.
  Add command line to test the functionality of setting link up and link
down in testpmd.

 app/test-pmd/cmdline.c  | 81 +
 app/test-pmd/testpmd.c  | 14 +++
 app/test-pmd/testpmd.h  |  2 +
 lib/librte_ether/rte_ethdev.c   | 38 +
 lib/librte_ether/rte_ethdev.h   | 34 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 63 +
 6 files changed, 232 insertions(+)

-- 
1.9.0

[dpdk-dev] Intel I350 fails to work with DPDK

2014-05-28 Thread sabu kurian

I have asked a similar question before, no one replied though.

I'm crafting my own packets in mbuf's (74 byte packets all) and sending it
using

ret = rte_eth_tx_burst(port_ids[lcore_id], 0, m_pool,burst_size);

When burst_size is 1, it does work. Work in the sense the NIC will continue
with sending packets, at a little over
50 percent of the link rate. For 1000 Mbps link rate .The observed
transmit rate of the NIC is 580 Mbps (using Intel DPDK). But it should be
possible to achieve at least 900 Mbps transmit rate with Intel DPDK and
I350 on 1 Gbps link.

Could someone help me out on this ?

Thanks and regards

[dpdk-dev] DPDK Latency Issue

2014-05-28 Thread Jun Han

Hi all,

I realized I made a mistake on my previous post. Please note the changes
below.

"While I vary the MAX_BURST_SIZE (1, 8, 16, 32, 64, and 128) and fix
BURST_TX_DRAIN_US=100 usec, I see a low average latency when sending a
burst of packets greater than the MAX_BURST_SIZE.
For example, when MAX_BURST_SIZE is 32, if I send a burst of 32 packets or
larger, then I get around 10 usec of latency. When the burst size is less
than 32, I see higher average latency, which make total sense."


On Mon, May 26, 2014 at 9:39 PM, Jun Han  wrote:

> Thanks a lot Jeff for your detailed explanation. I still have open
> question left. I would be grateful if someone would share their insight on
> it.
>
> I have performed experiments to vary both the MAX_BURST_SIZE (originally
> set as 32) and BURST_TX_DRAIN_US (originally set as 100 usec) in l3fwd
> main.c.
>
> While I vary the MAX_BURST_SIZE (1, 8, 16, 32, 64, and 128) and fix
> BURST_TX_DRAIN_US=100 usec, I see a low average latency when sending a
> burst of packets less than or equal to the MAX_BURST_SIZE.
> For example, when MAX_BURST_SIZE is 32, if I send a burst of 32 packets or
> less, then I get around 10 usec of latency. When it goes over it, it starts
> to get higher average latency, which make total sense.
>
> My main question are the following. When I start sending continuous packet
> at a rate of 14.88 Mpps for 64B packets, it shows consistently receiving an
> average latency of 150 usec, no matter what MAX_BURST_SIZE. My guess is
> that the latency should be bounded by BURST_TX_DRAIN_US, which is fixed at
> 100 usec. Would you share your thought on this issue please?
>
> Thanks,
> Jun
>
>
> On Thu, May 22, 2014 at 7:06 PM, Shaw, Jeffrey B  > wrote:
>
>> Hello,
>>
>> > I measured a roundtrip latency (using Spirent traffic generator) of
>> sending 64B packets over a 10GbE to DPDK, and DPDK does nothing but simply
>> forward back to the incoming port (l3fwd without any lookup code, i.e.,
>> dstport = port_id).
>> > However, to my surprise, the average latency was around 150 usec. (The
>> packet drop rate was only 0.001%, i.e., 283 packets/sec dropped) Another
>> test I did was to measure the latency due to sending only a single 64B
>> packet, and the latency I measured is ranging anywhere from 40 usec to 100
>> usec.
>>
>> 40-100usec seems very high.
>> The l3fwd application does some internal buffering before transmitting
>> the packets.  It buffers either 32 packets, or waits up to 100us
>> (hash-defined as BURST_TX_DRAIN_US), whichever comes first.
>> Try either removing this timeout, or sending a burst of 32 packets at
>> time.  Or you could try with testpmd, which should have reasonably low
>> latency out of the box.
>>
>> There is also a section in the Release Notes (8.6 How can I tune my
>> network application to achieve lower latency?) which provides some pointers
>> for getting lower latency if you are willing to give up top-rate throughput.
>>
>> Thanks,
>> Jeff
>>
>
>

[dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio

2014-05-28 Thread Antti Kantee

On 28/05/14 13:45, Thomas Monjalon wrote:
> So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
> If someone wants to work on it, it's possible to stage uio_pci_generic in
> dpdk.org in order to make it ready for kernel.org.

Back when researching MSI + uio_pci_generic, I found this:
http://www.gossamer-threads.com/lists/linux/kernel/1738200

I'm not sure I completely follow the logic of the argument there, but 
seems like the maintainer's(?) mind of uio_pci_generic never supporting 
MSI is quite made up.

[dpdk-dev] [PATCH v2 0/3] Support setting link up and link down

2014-05-28 Thread Ivan Boule

On 05/28/2014 09:14 AM, Ouyang Changchun wrote:
> Please ignore the previous patch series with subject: "Support administrative 
> link up and link down"
> This v2 patch series will replace the previous patch series.
>
> This patch series contain the following 3 items:
> 1. Add API to support setting link up and down, it can be used to repeatedly 
> stop and restart
> RX/TX of a port without re-allocating resources for the port and 
> re-configuring the port.
> 2. Implement the functionality of setting link up and down in IXGBE PMD.
> 3. Add command in testpmd to test the functionality of setting link up and 
> down of PMD.
>
> Ouyang Changchun (3):
>Add API to support set link up and link down.
>Implement the functionality of setting link up and link down in IXGBE
>  PMD.
>Add command line to test the functionality of setting link up and link
>  down in testpmd.
>

Acked by: Ivan Boule 

-- 
Ivan Boule
6WIND Development Engineer

[dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf

2014-05-28 Thread Ivan Boule

Hi Cristian,

Currently, the DPDK framework does not make any assumption on the actual
layout of a mbuf.
More precisely, the DPDK does not impose any constraint on the actual
location of additional metadata, if any, or on the actual location and
size of its associated payload data buffer.
This is coherent with the fact that mbuf pools are not created by the
DPDK itself, but by network applications that are free to choose
whatever packet mbuf layout that fits their particular needs.

There is one exception to this basic DPDK rule: the mbuf cloning feature 
available through the RTE_MBUF_SCATTER_GATHER configuration option 
assumes that the payload data buffer of the mbuf immediately follows the 
rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR, 
RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file 
lib/librte_mbuf/rte_mbuf.h).

The cloning feature prevents to build packet mbufs with their metadata 
located immediately after the rte_mbuf data structure, which is exactly 
what your patch introduces.

At least, a comment that clearly warns the user of this incompatibility
might be worth adding into both the code and your patch log.

Regards,
Ivan

On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote:
> Added zero-size field (offset in data structure) to specify the beginning of 
> packet meta-data in the packet buffer just after the mbuf.
>
> The size of the packet meta-data is application specific and the packet 
> meta-data is managed by the application.
>
> The packet meta-data should always be accessed through the provided macros.
>
> This is used by the Packet Framework libraries (port, table, pipeline).
>
> There is absolutely no performance impact due to this mbuf field, as it does 
> not take any space in the mbuf structure (zero-size field).
>
> Signed-off-by: Cristian Dumitrescu 
> ---
>   lib/librte_mbuf/rte_mbuf.h |   17 +
>   1 files changed, 17 insertions(+), 0 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 4a9ab41..bf09618 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -201,8 +201,25 @@ struct rte_mbuf {
>   struct rte_ctrlmbuf ctrl;
>   struct rte_pktmbuf pkt;
>   };
> + 
> + union {
> + uint8_t metadata[0];
> + uint16_t metadata16[0];
> + uint32_t metadata32[0];
> + uint64_t metadata64[0];
> + };
>   } __rte_cache_aligned;
>
> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset)   (mbuf->metadata[offset])
> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset)  
> (mbuf->metadata16[offset/sizeof(uint16_t)])
> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset)  
> (mbuf->metadata32[offset/sizeof(uint32_t)])
> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset)  
> (mbuf->metadata64[offset/sizeof(uint64_t)])
> +
> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   (>metadata[offset])
> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  
> (>metadata16[offset/sizeof(uint16_t)])
> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  
> (>metadata32[offset/sizeof(uint32_t)])
> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  
> (>metadata64[offset/sizeof(uint64_t)])
> +
>   /**
>* Given the buf_addr returns the pointer to corresponding mbuf.
>*/
>

-- 
Ivan Boule
6WIND Development Engineer

[dpdk-dev] [PATCH 0/4] Link Bonding Library

2014-05-28 Thread Neil Horman

On Wed, May 28, 2014 at 04:32:00PM +0100, declan.doherty at intel.com wrote:
> From: Declan Doherty 
> 
> Initial release of Link Bonding Library (lib/librte_bond) with support for 
> bonding modes :
>  0 - Round Robin
>  1 - Active Backup
>  2 - Balance l2 / l23 / l34 
>  3 - Broadcast
> 
Why make this a separate library?  That requires exposure of yet another API to
applications.  Instead, why not write a PMD that can enslave other PMD's and
treat them all as a single interface?  That way this all works with the existing
API.

Neil

> patches split:
>  1 - library + makefile changes
>  2 - Unit test suite, including code to generate packet bursts for
> testing rx and tx functionality of bonded device and a
> virtual/stubbed out ethdev for use as slave ethdev in testing
>  3 - Link bonding integration into testpmd, including :
>  - Includes the ability to  create new bonded devices.
>  - Add /remove bonding slave devices. 
>  - Interogate bonded device stats/configuration
>  - Change bonding modes and select balance transmit polices
>  4 - Add Link Bonding Library to Doxygen
> 
> 
>  app/test-pmd/cmdline.c|  550 +
>  app/test-pmd/parameters.c |4 +-
>  app/test-pmd/testpmd.c|   28 +-
>  app/test-pmd/testpmd.h|2 +
>  app/test/Makefile |3 +
>  app/test/commands.c   |3 +
>  app/test/packet_burst_generator.c |  276 +++
>  app/test/packet_burst_generator.h |   85 +
>  app/test/test.h   |1 +
>  app/test/test_link_bonding.c  | 4007 
> +
>  app/test/virtual_pmd.c|  580 ++
>  app/test/virtual_pmd.h|   74 +
>  config/common_bsdapp  |5 +
>  config/common_linuxapp|5 +
>  doc/doxy-api-index.md |1 +
>  doc/doxy-api.conf |1 +
>  lib/Makefile  |1 +
>  lib/librte_bond/Makefile  |   28 +
>  lib/librte_bond/rte_bond.c| 1679 
>  lib/librte_bond/rte_bond.h|  228 +++
>  mk/rte.app.mk |5 +
>  21 files changed, 7564 insertions(+), 2 deletions(-)
>  create mode 100644 app/test/packet_burst_generator.c
>  create mode 100644 app/test/packet_burst_generator.h
>  create mode 100644 app/test/test_link_bonding.c
>  create mode 100644 app/test/virtual_pmd.c
>  create mode 100644 app/test/virtual_pmd.h
>  create mode 100644 lib/librte_bond/Makefile
>  create mode 100644 lib/librte_bond/rte_bond.c
>  create mode 100644 lib/librte_bond/rte_bond.h
> 
> -- 
> 1.8.5.3
> 
>

[dpdk-dev] [PATCH] mk: fix link with gcc

2014-05-28 Thread Olivier MATZ

Hi Thomas,

On 05/27/2014 02:55 PM, Thomas Monjalon wrote:
> Some linker options were not prefixed by -Wl, when using gcc:
>   -z muldefs
>   -melf_i386 (32-bit config)
>
> Using macro linkerprefix is fixing it.
>
> Signed-off-by: Thomas Monjalon 

The patch looks correct, but from the commit log it's difficult
to understand what is the problem today. Is there a compilation
issue? Or is it just cleaning?

Regards,
Olivier

[dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line

2014-05-28 Thread Thomas Monjalon

2014-05-28 10:35, Burakov, Anatoly:
> Hi Thomas,
> 
> > > + }
> > > 
> > >   else if (!strcmp(lgopts[option_index].name,
> > 
> > OPT_CREATE_UIO_DEV))
> > 
> > another code style issue reported by checkpatch.pl ;)
> > 
> > But it should be fixed by removing this code as Stephen suggests.
> 
> I'm not sure this could should be removed. Igb_uio allows to pick interrupt
> mode, so why not VFIO? I've modified my code to try all interrupt modes if
> nothing was explicitly specified, but why should that preclude the user
> from selecting a specific interrupt type if he so desires?
> 
> As for the style error - the whole chunk of code uses the same style there,
> so either we fix all of that (in a separate patch?), or leave it as it is.

OK to leave it as is.

But please, let's try to keep a clean code style.
About existing code style issues, separated patches for cleaning should be 
well accepted.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] cpu_layout.py: adjust output format to align

2014-05-28 Thread Thomas Monjalon

Hi,

2014-05-28 11:02, Shannon Zhao:
> I have checked my patch. It doesn't apply correctly when "core id" is
> greater than 2 characters.
> 
> Following is my revised patch. It's based on the maximum length of the "core
> id" and "processor" to adjust the alignment length.

Thank you for reworking your patch.

Please, could you send your patch as v2 with "git send-email"?
There are some guidelines here:
http://dpdk.org/dev#send

Thanks
-- 
Thomas

[dpdk-dev] Intel I350 fails to work with DPDK

2014-05-28 Thread Richardson, Bruce


> From: sabu kurian [mailto:sabu2kurian at gmail.com] 
> Sent: Wednesday, May 28, 2014 11:54 AM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Intel I350 fails to work with DPDK
>
> Hai bruce,
> Thanks for the reply.
> I even tried that before. Having a burst size of 64 or 128 simply fails. The 
> card would send out a few packets 
> (some 400 packets of 74 byte size) and then freeze. For my application... I'm 
> trying to generate the peak 
> traffic possible with the link speed and the NIC.

Bursts of 64 and 128 are rather large, can you perhaps try using bursts of 16 
and 32 and see what the result is? The drivers are generally tuned for a max 
burst size of about 32 packets.

[dpdk-dev] [PATCH] cpu_layout.py: adjust output format to align

2014-05-28 Thread Shannon Zhao

Hi Thomas,

Thanks for your reply.

I have checked my patch. It doesn't apply correctly when "core id" is greater 
than 2 characters.

Following is my revised patch. It's based on the maximum length of the "core 
id" and "processor" to adjust the alignment length.


Bug: when "core id" is greater than 9, the cpu_layout.py output doesn't align.

Socket 0Socket 1
-   -
Core 9  [4, 16] [10, 22]

Core 10 [5, 17] [11, 23]

Solution: adjust output format to align

Socket 0Socket 1
-   -
Core 9  [4, 16] [10, 22]

Core 10 [5, 17] [11, 23]

Signed-off-by: Shannon Zhao 
---
 tools/cpu_layout.py |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/cpu_layout.py b/tools/cpu_layout.py
index 9eff9d7..eeda17e 100755
--- a/tools/cpu_layout.py
+++ b/tools/cpu_layout.py
@@ -75,15 +75,23 @@ print "cores = ",cores
 print "sockets = ", sockets
 print ""

+max_processor_len=len(str(len(cores)*len(sockets)*2-1))
+max_core_map_len = max_processor_len*2+4
+if max_core_map_len < 12:
+max_core_map_len = 12
+max_core_id_len=len(str(max(cores)))
+
+print " ".ljust(max_core_id_len+5),
 for s in sockets:
-   print "\tSocket %s" % s,
+   print "Socket %s" % str(s).ljust(max_core_map_len-7),
 print ""
+print " ".ljust(max_core_id_len+5),
 for s in sockets:
-   print "\t-",
+   print "-".ljust(max_core_map_len),
 print ""

 for c in cores:
-   print "Core %s" % c,
+   print "Core %s" % str(c).ljust(max_core_id_len),
for s in sockets:
-   print "\t", core_map[(s,c)],
+   print str(core_map[(s,c)]).ljust(max_core_map_len),
print "\n"
--
1.7.1


On 2014/5/27 18:30, Thomas Monjalon wrote:
> Hi,
> 
> Your patch doesn't apply correctly.
> Could you check it, please?
> 
> I have also a comment inlined:
> 
> 2014-05-27 17:41, Shannon Zhao:
>> -   print "\t", core_map[(s,c)],
>> +   print core_map[(s,c)],"\t",
> 
> Is it possible to fix a minimum alignment of 2 characters?
> It could prevent from such alignment problem:
> 
> Core 11 [9, 33] [21, 45] 
> Core 12 [10, 34][22, 46] 
> 
> Thanks
>

[dpdk-dev] Intel I350 fails to work with DPDK

2014-05-28 Thread Richardson, Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of sabu kurian
> Sent: Wednesday, May 28, 2014 10:42 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Intel I350 fails to work with DPDK
> 
> I have asked a similar question before, no one replied though.
> 
> I'm crafting my own packets in mbuf's (74 byte packets all) and sending it
> using
> 
> ret = rte_eth_tx_burst(port_ids[lcore_id], 0, m_pool,burst_size);
> 
> When burst_size is 1, it does work. Work in the sense the NIC will continue
> with sending packets, at a little over
> 50 percent of the link rate. For 1000 Mbps link rate .The observed
> transmit rate of the NIC is 580 Mbps (using Intel DPDK). But it should be
> possible to achieve at least 900 Mbps transmit rate with Intel DPDK and
> I350 on 1 Gbps link.
> 
> Could someone help me out on this ?
> 
> Thanks and regards

Sending out a single packet at a time is going to have a very high overhead, as 
each call to tx_burst involves making PCI transactions (MMIO writes to the 
hardware ring pointer). To reduce this penalty you should look to send out the 
packets in bursts, thereby saving PCI bandwidth and splitting the cost of each 
MMIO write over multiple packets.

Regards,
/Bruce

[dpdk-dev] [PATCH] mk: fix link with gcc

2014-05-28 Thread Neil Horman

On Tue, May 27, 2014 at 02:55:16PM +0200, Thomas Monjalon wrote:
> Some linker options were not prefixed by -Wl, when using gcc:
>   -z muldefs
>   -melf_i386 (32-bit config)
> 
> Using macro linkerprefix is fixing it.
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  mk/rte.lib.mk | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index f5d2789..c58e68e 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -62,6 +62,8 @@ exe2cmd = $(strip $(call dotfile,$(patsubst %,%.cmd,$(1
>  ifeq ($(LINK_USING_CC),1)
>  # Override the definition of LD here, since we're linking with CC
>  LD := $(CC)
> +LD_MULDEFS := $(call linkerprefix,-z$(comma)muldefs)
> +CPU_LDFLAGS := $(call linkerprefix,$(CPU_LDFLAGS))
>  endif
>  
Agree with Olivier, what exactly is the problem here?  Also, I don't think this
is correct, as CPU_LD_FLAGS and -z muldefs below is used in conjunction with
$LD.  It would make sense to prefix -Wl to these options if we were passing them
through $CC, but not $LD

Neil

>  O_TO_A = $(AR) crus $(LIB) $(OBJS-y)
> @@ -73,7 +75,7 @@ O_TO_A_DO = @set -e; \
>   $(O_TO_A) && \
>   echo $(O_TO_A_CMD) > $(call exe2cmd,$(@))
>  
> -O_TO_S = $(LD) $(CPU_LDFLAGS) -z muldefs -shared $(OBJS-y) -o $(LIB)
> +O_TO_S = $(LD) $(CPU_LDFLAGS) $(LD_MULDEFS) -shared $(OBJS-y) -o $(LIB)
>  O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
>  O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
>  O_TO_S_DO = @set -e; \
> @@ -89,7 +91,7 @@ O_TO_C_DO = @set -e; \
>   $(lib_dir) \
>   $(copy_obj)
>  else
> -O_TO_C = $(LD) -z muldefs -shared $(OBJS-y) -o $(LIB_ONE)
> +O_TO_C = $(LD) $(LD_MULDEFS) -shared $(OBJS-y) -o $(LIB_ONE)
>  O_TO_C_STR = $(subst ','\'',$(O_TO_C)) #'# fix syntax highlight
>  O_TO_C_DISP = $(if $(V),"$(O_TO_C_STR)","  LD_C $(@)")
>  O_TO_C_DO = @set -e; \
> -- 
> 1.9.2
> 
>

[dpdk-dev] Please any one who can help me with librte_sched

2014-05-28 Thread Dumitrescu, Cristian

Hi Ariel,

I think you put your finger precisely on the problem associated with your 
approach: you have to iterate through all the queues and free up the packets, 
which takes a lot of time. Obviously this is not done by the rte_sched API.

Maybe a limited workaround for this approach would be to create and service the 
parallel rte_sched using a different CPU core, while the previous CPU core 
takes its time to free up all the packets and data structures correctly.

Regards,
Cristian

From: Ariel Rodriguez [mailto:arodrig...@callistech.com]
Sent: Wednesday, May 28, 2014 1:46 AM
To: Dumitrescu, Cristian
Cc: Stephen Hemminger; dev at dpdk.org
Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched

Thank you perfect explanation, i think im going to creating a new parallel 
rte_sched_port and change the reference with managment core updating the 
tx/sched core. So, what happens with the packets on the old reference if i just 
do rte_port_free on it, are them leaked? Is there a why to flush the 
rte_sched_port or maybe gets the packet total size somewhere?.
Anyway the rcu algoritm fits ok in this aproach ... but maybe there is a way to 
flush the old reference port, and work from there with the recently  created 
rte_sched_port 

Regars,
Ariel.

On Tue, May 27, 2014 at 3:31 PM, Dumitrescu, Cristian mailto:cristian.dumitrescu at intel.com>> wrote:
Hi Ariel,

What's wrong with calling rte_sched_subport_config() and 
rte_sched_pipe_config() during run-time?

This assumes that:

1. Port initialization is done, which includes the following:
a) the number of subports, pipes per subport are fixed
b) the queues are all created and their size is fixed
c) the pipe profiles are defined
d) Basically the maximal data structures get created (maximum number of 
supports, pipes and queues) with no run-time changes allowed, apart for the 
bandwidth related parameters. Queues that do not receive packets are not used 
now, they will be used as soon as they get packets. The packets-to-queues 
mapping logic can change over time, as well as the level of activity for 
different users/queues.

2. The CPU core calling the subport/pipe config functions is the same as the 
core doing enque/dequeue for this port (for thread safety reasons).
a) As you say, the management core can send update requests to the core running 
the scheduler, with the latter sampling the request queue regularly and 
performing the updates.

Regards,
Cristian

-Original Message-
From: dev [mailto:dev-bounces at dpdk.org] On 
Behalf Of Stephen Hemminger
Sent: Tuesday, May 27, 2014 5:35 PM
To: Ariel Rodriguez
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched

On Tue, 27 May 2014 10:33:02 -0300
Ariel Rodriguez mailto:arodriguez at 
callistech.com>> wrote:

> Hello , this is my third mail , the previous mails have not been answered
> yet.
>
> I justo need someone explains to me  how the librte_sched framework behaves
> in a specific situation.
>
> I have a managment application , this connects with a ring with the tx
> core, when a user applies some configuration of the bandwith mangement ,
> the tx core read the message in the ring parse the configuration in a
> rte_port_params struct , subport_params and pipe_params, then creates a new
> rte_sched from scratch , and then changes the pointer of the current
> rte_sched_port currently doing scheduling and then the code execurte
> rte_sched_port_free() for the unreference (reference by temporal pointer)
> rte_sched_port . This is the only way i found for applying dinamic
> configuration or changes to the qos framework.
> So, with this, what happens with the packets attached to the old
> rte_sched_port while is deleted? are those lost packets inside the
> rte_sched_port generates memory leaks?  how can i recover this packets _
> just dequeing from the port scheduler? Where the port scheduler  indicates
> empty packets in the queu state?
>
> Is there a better way to achieve this kind of behaviour? i just need to
> update  the rte_sched_port configuration dinamically, and i want to change
> the current pipe configuration and sub port configuration also.
>
> Regards .

If you need to do dynamic changes, I would recommend using an RCU type
algorithm where you exchange in new parameters and then cleanup/free
after a grace period.  See http://lttng.org/urcu
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf

2014-05-28 Thread Ananyev, Konstantin

Hi,

>The only win from this is to save the byte for the type field.
>Yes bits here are precious.

>Does external application mix control and data mbuf's in the same ring?
>The stuff in the tree only uses type field for debug validation/sanity
>checks.

>Since it is only one bit, maybe you can find one bit to store that. 
>Since buffer and pool addresses are going to be at least 32 bit aligned
>maybe you can use the old GC trick of using the LSB as flag.

Or, as an alternative we can move mbuf type up into the mempool.
In most cases user has to deal only with one particular type of mbufs and he 
already knows what mbuf type it would be.
For the rare cases when code need to deal with mix of mbuf types,
it is probably ok to read mbuf type from the corresponding mempool. 
Of course, it would mean  that all elements in the mempool should have the same 
type,
but I don't think right now people using mempools with mix of pktmbuf/ctrlmbuf 
anyway.   

Konstantin

[dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio

2014-05-28 Thread Stephen Hemminger

On Wed, 28 May 2014 15:45:02 +0200
Thomas Monjalon  wrote:

> 2014-05-23 00:10, Antti Kantee:
> > On 22/05/14 13:13, Thomas Monjalon wrote:
> > > 2014-05-19 16:51, Anatoly Burakov:
> > >> Note that since igb_uio no longer has a PCI ID list, it can now be
> > >> bound to any device, not just those explicitly supported by DPDK. In
> > >> other words, it now behaves similar to PCI stub, VFIO and other generic
> > >> PCI drivers.
> > > 
> > > I wonder if we could replace igb_uio by uio_pci_generic?
> > 
> > I've been running plenty of the NetBSD kernel PCI drivers in Linux
> > userspace on top of uio_pci_generic, including NICs supported by DPDK.
> > The only real annoyance is that mainline uio_pci_generic doesn't support
> > MSI.  A pseudo-annoyance is that uio_pci_generic turns interrupts off
> > from the PCI config space each time after you read an interrupt, so they
> > have to be reenabled after each one (and NetBSD kernel drivers tend to
> > like using interrupts for everything).
> > 
> > The annoyance of vfio is iommus.  Yes, I want to make the tradeoff of
> > possibly scribbling memory vs. not being able to do anything on the
> > wrong system.
> > 
> > I'd like to see a generic Linux kernel PCI driver blob without
> > annoyances, though not yet annoyed enough to do anything myself ;)
> 
> So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
> If someone wants to work on it, it's possible to stage uio_pci_generic in 
> dpdk.org in order to make it ready for kernel.org.
> 

I am doing a new version of uio_pci for upstream kernel and will submit
when ready.  It will be for 3.10 or later kernel, will not bother backporting
past that.

[dpdk-dev] [PATCH 0/2] L3FWD sample optimisation

2014-05-28 Thread Ananyev, Konstantin

Hi Thomas,

>As you are doing optimizations, it's important to know the performance gain.
>It could help to mitigate future reworks.
>So please, could you provide some benchmarking numbers in the commit log?

Some performance data below.
Also, forgot to mention that new code path can be switched on/off by setting
ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
Do I need to resubmit the whole patch series, or just a cover letter, or ...?

Konstantin

SUT:   dual-socket board IVB 2.8 GHz  with 4 ports on 4 NIC (all at socket 0) 
connected to the traffic generator.
2x1GB pages, kernel: 3.11.3-201.fc19.x86_64, gcc 4.8.2.
64B packets, using the packet flooding method.
All 4 ports are managed by one logical core:
Optimised scalar PMD RX/TX was used.

   DIFF % (NEW-OLD)
IPV4-CONT-BURST:  +23%
IPV6-CONT-BURST : +13% 
IPV4/IPV6-CONT-BURST:   +8%
IPV4-4STREAMSX8:  +7%
IPV4-4STREAMSX1:  -2%

Test cases description:
IPV4-CONT-BURST - IPV4 packets all packets from the one input port are destined 
for the same output port.
IPV6-CONT-BURST - IPV6 packets all packets from the one input port are destined 
for the same output port.
IPV4/IPV6-CONT-BURST - mix of the first 2 with interleave=1 (e.g: 
IPV4,IPV6,IPV4,IPV6, ...)
IPV4-4STREAMSX1 - 4 streams of IPV4 packets, where all packets from same stream 
are destined for the same output port
(e.g: IPV4_DST_P0, IPV4_DST_P1,  IPV4_DST_P2, IPV4_DST_P3, IPV4_DST_P0, ...)
IPV4-4STREAMSX8 - same as above but packets for each stream are coming in 
groups of 8
(e.g:  IPV4_DST_P0 X 8, IPV4_DST_P1 X 8,  IPV4_DST_P2 X 8, IPV4_DST_P3 X 8, 
IPV4_DST_P0 X 8, ...)

[dpdk-dev] [PATCH 0/4] New library: rte_distributor

2014-05-28 Thread Richardson, Bruce

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, May 27, 2014 11:33 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] New library: rte_distributor
> 
> Hi Bruce,
> 
> As for rte_acl, I have some formatting comments.
> 
> 2014-05-20 11:00, Bruce Richardson:
> > This adds a new library to the Intel DPDK whereby a set of packets can be
> > distributed one-at-a-time to a set of worker cores, with dynamic load
> > balancing being done between those workers. Flows are identified by a tag
> > within the mbuf (currently the RSS hash field, 32-bit value), which is used
> > to ensure that no two packets of the same flow are processed in parallel,
> > thereby preserving ordering.
> >
> >  app/test/Makefile  |   2 +
> >  app/test/commands.c|   7 +-
> >  app/test/test.h|   2 +
> >  app/test/test_distributor.c| 582 
> > +
> >  app/test/test_distributor_perf.c   | 274 
> >  config/defconfig_i686-default-linuxapp-gcc |   5 +
> >  config/defconfig_i686-default-linuxapp-icc |   5 +
> >  config/defconfig_x86_64-default-bsdapp-gcc |   6 +
> >  config/defconfig_x86_64-default-linuxapp-gcc   |   5 +
> >  config/defconfig_x86_64-default-linuxapp-icc   |   5 +
> >  lib/Makefile   |   1 +
> >  lib/librte_distributor/Makefile|  50 +++
> >  lib/librte_distributor/rte_distributor.c   | 417 ++
> >  lib/librte_distributor/rte_distributor.h   | 173 
> >  lib/librte_eal/common/include/rte_tailq_elem.h |   2 +
> >  mk/rte.app.mk  |   4 +
> >  16 files changed, 1539 insertions(+), 1 deletion(-)
> 
> As you are introducing a new library, you need to update
> doxygen configuration and start page:
> doc/doxy-api.conf
> doc/doxy-api-index.md

Didn't know to update those, I'll add it in to the v2 patch set.

> 
> I've run checkpatch.pl from kernel.org on these distributor patches
> and it reports some code style issues.
> Could you have a look at it please?

Yep. I've downloaded and run that patch myself in preparation for a V2 patch 
set (due really soon), so hopefully all should be well second time round.

[dpdk-dev] Please any one who can help me with librte_sched

2014-05-28 Thread Ariel Rodriguez

Ok i can do that... but still is there a way to ask to the rte_sched_port
something like is_empty
... Or simply if the dequeue function return 0 packets retrieved from the
old port structure running in other core,
Can i  assume that port is empty with that?

Regards

Ariel.
 On May 28, 2014 7:10 AM, "Dumitrescu, Cristian" <
cristian.dumitrescu at intel.com> wrote:

>  Hi Ariel,
>
>
>
> I think you put your finger precisely on the problem associated with your
> approach: you have to iterate through all the queues and free up the
> packets, which takes a lot of time. Obviously this is not done by the
> rte_sched API.
>
>
>
> Maybe a limited workaround for this approach would be to create and
> service the parallel rte_sched using a different CPU core, while the
> previous CPU core takes its time to free up all the packets and data
> structures correctly.
>
>
>
> Regards,
>
> Cristian
>
>
>
> *From:* Ariel Rodriguez [mailto:arodriguez at callistech.com]
> *Sent:* Wednesday, May 28, 2014 1:46 AM
> *To:* Dumitrescu, Cristian
> *Cc:* Stephen Hemminger; dev at dpdk.org
> *Subject:* Re: [dpdk-dev] Please any one who can help me with librte_sched
>
>
>
> Thank you perfect explanation, i think im going to creating a new parallel
> rte_sched_port and change the reference with managment core updating the
> tx/sched core. So, what happens with the packets on the old reference if i
> just do rte_port_free on it, are them leaked? Is there a why to flush the
> rte_sched_port or maybe gets the packet total size somewhere?.
>
> Anyway the rcu algoritm fits ok in this aproach ... but maybe there is a
> way to flush the old reference port, and work from there with the recently
>  created rte_sched_port 
>
>
>
> Regars,
>
> Ariel.
>
>
>
> On Tue, May 27, 2014 at 3:31 PM, Dumitrescu, Cristian <
> cristian.dumitrescu at intel.com> wrote:
>
> Hi Ariel,
>
> What's wrong with calling rte_sched_subport_config() and
> rte_sched_pipe_config() during run-time?
>
> This assumes that:
>
> 1. Port initialization is done, which includes the following:
> a) the number of subports, pipes per subport are fixed
> b) the queues are all created and their size is fixed
> c) the pipe profiles are defined
> d) Basically the maximal data structures get created (maximum number of
> supports, pipes and queues) with no run-time changes allowed, apart for the
> bandwidth related parameters. Queues that do not receive packets are not
> used now, they will be used as soon as they get packets. The
> packets-to-queues mapping logic can change over time, as well as the level
> of activity for different users/queues.
>
> 2. The CPU core calling the subport/pipe config functions is the same as
> the core doing enque/dequeue for this port (for thread safety reasons).
> a) As you say, the management core can send update requests to the core
> running the scheduler, with the latter sampling the request queue regularly
> and performing the updates.
>
> Regards,
> Cristian
>
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Tuesday, May 27, 2014 5:35 PM
> To: Ariel Rodriguez
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Please any one who can help me with librte_sched
>
> On Tue, 27 May 2014 10:33:02 -0300
> Ariel Rodriguez  wrote:
>
> > Hello , this is my third mail , the previous mails have not been answered
> > yet.
> >
> > I justo need someone explains to me  how the librte_sched framework
> behaves
> > in a specific situation.
> >
> > I have a managment application , this connects with a ring with the tx
> > core, when a user applies some configuration of the bandwith mangement ,
> > the tx core read the message in the ring parse the configuration in a
> > rte_port_params struct , subport_params and pipe_params, then creates a
> new
> > rte_sched from scratch , and then changes the pointer of the current
> > rte_sched_port currently doing scheduling and then the code execurte
> > rte_sched_port_free() for the unreference (reference by temporal pointer)
> > rte_sched_port . This is the only way i found for applying dinamic
> > configuration or changes to the qos framework.
> > So, with this, what happens with the packets attached to the old
> > rte_sched_port while is deleted? are those lost packets inside the
> > rte_sched_port generates memory leaks?  how can i recover this packets _
> > just dequeing from the port scheduler? Where the port scheduler
>  indicates
> > empty packets in the queu state?
> >
> > Is there a better way to achieve this kind of behaviour? i just need to
> > update  the rte_sched_port configuration dinamically, and i want to
> change
> > the current pipe configuration and sub port configuration also.
> >
> > Regards .
>
> If you need to do dynamic changes, I would recommend using an RCU type
> algorithm where you exchange in new parameters and then cleanup/free
> after a grace period.  See http://lttng.org/urcu
>
>

[dpdk-dev] PMD for Cisco VIC Ethernet NIC - Request for guidelines for submission

2014-05-28 Thread Sujith Sankar (ssujith)

Hi all,

We have been working on development of poll-mode driver for Cisco VIC
Ethernet NIC and integration of it with DPDK.  We would like to submit
this poll-mode driver (ENIC PMD) to the DPDK community so that it could be
part of the DPDK tree.

Could someone please provide the guidelines and steps to do this?  As of
now, ENIC PMD is being tested with DPDK 1.6.0r2.  Is it alright to submit
a patch for DPDK 1.6.0r2?

One aspect of ENIC PMD is that it works with VFIO-PCI and not UIO.  Hope
this is acceptable.  The following thread in dpdk-dev influenced this
decision.
http://dpdk.org/ml/archives/dev/2013-July/000373.html

ENIC PMD uses one interrupt per interface and it is used by the NIC for
signalling the driver in case of any error.  Since this does not come in
the fast path, it should be acceptable, isn?t it?

Please give your suggestions and comments.

Thanks,
-Sujith

[dpdk-dev] [PATCH 0/3] Support administrative link up and link down

2014-05-28 Thread Ouyang, Changchun

Hi Ivan,
Thanks very much for your detailed response for this issue,
I think your recommendation makes sense, and I will update the naming and 
re-send a patch for link-up and link-down.

Best regards,
Changchun

-Original Message-
From: Ivan Boule [mailto:ivan.bo...@6wind.com] 
Sent: Friday, May 23, 2014 5:25 PM
To: Ouyang, Changchun; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/3] Support administrative link up and link down

On 05/23/2014 04:08 AM, Ouyang, Changchun wrote:
> Hi Ivan
>
> To some extent, I also agree with you.
> But customer hope DPDK can provide an interface like "ifconfig up" and 
> "ifconfig down" in linux, They can invoke such an interface in user 
> application to repeated stop and start dev frequently, and Make sure 
> RX and TX work fine after each start, I think it is not necessary to 
> do really device start and stop at Each time, just need start and stop RX and 
> TX function, so the straightforward method is to enable and disable tx lazer 
> in ixgbe.
> But in the ether level we need a more generic api name, here is 
> rte_eth_dev_admin_link_up/down, while enable_tx_laser is not suitable, Enable 
> and disable tx laser is a way in ixgbe to fulfill the administrative link up 
> and link down.
> maybe Fortville and future generation NIC will use other ways to fulfill the 
> admin_link_up/down.
>

Hi Changchun,

I do not understand what your customer effectively needs.
First of all, if I understand well, your customer's application does not really 
need to invoke the DPDK functions "eth_dev_stop" and "eth_dev_start" for 
addressing its problem, for instance to reconfigure RX/TX queues of the port.
When considering the implementation in the ixgbe PMD of the function 
"rte_eth_dev_admin_link_down", its only visible effects from the DPDK 
application perspective is that no input packet can be received anymore, and 
output packets cannot be transmitted (once having filled the TX queues).

Conversely, the only visible effect of the "rte_eth_dev_admin_link_up"
function is that input packets are received again, and that output packets can 
be successfully transmitted.

In fact, by disabling the TX laser on a ixgbe port, the only interesting effect 
of the function "rte_eth_dev_admin_link_down" is that it notifies the peer 
system of a hardware link DOWN event (with no physical link unplug on the peer 
side).
Conversely, by enabling the TX laser on a ixgbe port, the only interesting 
effect of the function "rte_eth_dev_admin_link_up" is that it notifies the peer 
system of a hardware link UP event.

Is that the actions that your customer's application actually needs to perform? 
If so, then this certainly deserves a real operational use case that it is 
worth describing in the patch log.
This would help DPDK PMD implementors to understand what such functions can be 
used for, and to decide whether they actually need to be supported by the PMD.

Assuming that these 2 functions need to be provided to address the issue 
described above, I do not think that the word "admin" brings anything for 
understanding their role. In fact, the word "admin" rather suggests a pure 
"software" down/up setting, instead of a physical one.
Naming these 2 functions "rte_eth_dev_set_link_down"
and "rte_eth_dev_set_link_up" better describes their expected effect.

Regards,
Ivan

>
> On 05/22/2014 04:44 PM, Ouyang, Changchun wrote:
>> Hi Ivan
>> For this one, it seems long story for that...
>> In short,
>> Some customer have such kind of requirement, they want to repeatedly
>> start(rte_dev_start) and stop(rte_dev_stop) the port for RX and TX, 
>> but they find after several times start and stop, the RX and TX can't work 
>> well even the port starts,  and the packets error number increase.
>>
>> To resolve this error number increase issue, and let port work fine 
>> even after repeatedly start and stop, We need a new API to do it, after 
>> discussing, we have these 2 API, admin link up and admin link down.
>
> If I understand well, this "feature" is not needed by itself, but only as a 
> work-around to address issues when repeatedly invoking the functions 
> ixgbe_dev_stop and ixgbe_dev_start.
> Do such issues appear when performing the same operations with the Linux 
> kernel driver?
>
> Anyway, I suppose that such functions have to be automatically invoked 
> by the same code of the network application that invokes the functions 
> ixgbe_dev_stop and ixgbe_dev_start (said differently, there is no need 
> for a manual assistance !)
>
> In that case, would not it be possible - and highly preferable - to directly 
> invoke the functions ixgbe_disable_tx_laser and, then, ixgbe_enable_tx_laser 
> from the appropriate step during the execution of the function 
> ixgbe_dev_start(), waiting for some appropriate delays between the two 
> operations, if so needed?
>
> Regards,
> Ivan
>
>
>>
>> Any difference if use " dev_link_start/stop" or " dev_link_up/down"?
>> to me, admin_link_up/down is better than

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-05-28 Thread Thomas Monjalon

Hi Jingjing,

2014-05-24 09:37, Jingjing Wu:
> A generic filter mechanism for handling special packet is required.
> It will allow filters to be set in HW when available so that specific
> packet may be filtered by NICs to specific descriptor queues for
> processing. Currently only Flow Director for Intel's 10GbE 82599
> devices is available. Other types of filter are not support.
> NIC filters list below are implemented in this patchset:
>   ethertype filter, syn filter, 2tuple filter and flex filter for 82580 and
> i350 ethertype filter, syn filter, 5tuple filter for 82576
>   ethertype filter, syn filter and 5tuple filter for 82599

I'd like we have a discussion about how this API is generic enough.
I think many people would like to integrate drivers for other NICs in DPDK and 
I'd hate to see a global rework of this API because we haven't tried to think 
about it before.

First, is there someone in the mailing list who knows other hardware which 
could fit in this filtering feature?

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-05-28 Thread Wu, Jingjing

Hi, Thomas

The generic you said may be different from I mentioned in last mail. You are 
discussing whether the APIs provide for NIC filters is generic or not. About 
that we can use same API for a type of filter. For example, if we want to 
configure ethertype filter, we can use the same API, no matter the NIC is 
82580, i350, 82576 or 82599. We think these NICs may be most common used.

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wu, Jingjing
Sent: Wednesday, May 28, 2014 8:53 AM
To: Thomas Monjalon
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

Hi, Thomas

This patch is mainly about multiple NIC filters implement. It has close 
relationship with NICs.
As the patch says:
NIC filters list below are implemented in this patchset:
ethertype filter, syn filter, 2tuple filter and flex filter for 82580 
and i350 
ethertype filter, syn filter, 5tuple filter for 82576
ethertype filter, syn filter and 5tuple filter for 82599

The same type filter uses the same API for the NICs list above.
About the generic filter feature, how to define the "generic" is still in 
discussing, and not included in this patch. 
These NIC filters implemented in this patch are first step. Even without 
generic, it also provides a way to configure these NIC filters to hardware in 
DPDK PMD.


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
Sent: Wednesday, May 28, 2014 7:22 AM
To: Wu, Jingjing
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

Hi Jingjing,

2014-05-24 09:37, Jingjing Wu:
> A generic filter mechanism for handling special packet is required.
> It will allow filters to be set in HW when available so that specific 
> packet may be filtered by NICs to specific descriptor queues for 
> processing. Currently only Flow Director for Intel's 10GbE 82599 
> devices is available. Other types of filter are not support.
> NIC filters list below are implemented in this patchset:
>   ethertype filter, syn filter, 2tuple filter and flex filter for
> 82580 and
> i350 ethertype filter, syn filter, 5tuple filter for 82576
>   ethertype filter, syn filter and 5tuple filter for 82599

I'd like we have a discussion about how this API is generic enough.
I think many people would like to integrate drivers for other NICs in DPDK and 
I'd hate to see a global rework of this API because we haven't tried to think 
about it before.

First, is there someone in the mailing list who knows other hardware which 
could fit in this filtering feature?

Thanks
--
Thomas

[dpdk-dev] [PATCH 0/3] ixgbe: Add L2 Ethertype, SYN and Five tuple queue filters

2014-05-28 Thread Thomas Monjalon

Hi Vladimir,

Seems like hardware filtering becomes useful these days :)

2014-05-19 19:51, Vladimir Medvedkin:
> This patchset adds in addition to the Flow Director filters L2 Ethertype,
> SYN and Five tuple queue filters to route packets according to ethertype,
> l4 proto, source/destination ip/ports pool and presence of SYN flag in TCP
> packet. Unlike http://dpdk.org/ml/archives/dev/2014-May/002512.html this
> gives capability to work with pools. This patch functionality can be merged
> with the patch above.

2 comments:

1) Do you have a good confidence that this new API is generic enough to be 
used by other NICs than ixgbe?

2) Could you try to check your patches with the kernel script checkpatch.pl, 
please?

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Thomas Monjalon

Hi,

checkpatch.pl is reporting some errors and I think some of them should avoided.
Please check it.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Ouyang, Changchun

Yes I will send out a patch v3 to replace the patch v2.
Thanks
Changchun

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, May 28, 2014 7:02 AM
To: Ouyang, Changchun
Cc: dev at dpdk.org
Subject: Re: [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

Hi,

checkpatch.pl is reporting some errors and I think some of them should avoided.
Please check it.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-05-28 Thread Wu, Jingjing

Hi, Thomas

This patch is mainly about multiple NIC filters implement. It has close 
relationship with NICs.
As the patch says:
NIC filters list below are implemented in this patchset:
ethertype filter, syn filter, 2tuple filter and flex filter for 82580 
and i350 
ethertype filter, syn filter, 5tuple filter for 82576
ethertype filter, syn filter and 5tuple filter for 82599

The same type filter uses the same API for the NICs list above.
About the generic filter feature, how to define the "generic" is still in 
discussing, and not included in this patch. 
These NIC filters implemented in this patch are first step. Even without 
generic, it also provides a way to configure these NIC filters to hardware in 
DPDK PMD.


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, May 28, 2014 7:22 AM
To: Wu, Jingjing
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

Hi Jingjing,

2014-05-24 09:37, Jingjing Wu:
> A generic filter mechanism for handling special packet is required.
> It will allow filters to be set in HW when available so that specific 
> packet may be filtered by NICs to specific descriptor queues for 
> processing. Currently only Flow Director for Intel's 10GbE 82599 
> devices is available. Other types of filter are not support.
> NIC filters list below are implemented in this patchset:
>   ethertype filter, syn filter, 2tuple filter and flex filter for 
> 82580 and
> i350 ethertype filter, syn filter, 5tuple filter for 82576
>   ethertype filter, syn filter and 5tuple filter for 82599

I'd like we have a discussion about how this API is generic enough.
I think many people would like to integrate drivers for other NICs in DPDK and 
I'd hate to see a global rework of this API because we haven't tried to think 
about it before.

First, is there someone in the mailing list who knows other hardware which 
could fit in this filtering feature?

Thanks
--
Thomas

[dpdk-dev] [PATCH v2 1/3] ether: Add API to support setting TX rate for queue and VF

2014-05-28 Thread Thomas Monjalon

Hi Changchun,

2014-05-26 15:45, Ouyang Changchun:
>  /**
> + * Set the rate limitation for a queue on an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_idx
> + *   The queue id.
> + * @param tx_rate
> + *   The tx rate allocated from the total link speed for this queue.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support this feature.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
> + uint16_t tx_rate);
> +
> +/**
> + * Set the rate limitation for a vf on an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param vf
> + *   VF id.
> + * @param tx_rate
> + *   The tx rate allocated from the total link speed for this VF id.
> + * @param q_msk
> + *   The queue mask which need to set the rate.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support this feature.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf,
> + uint16_t tx_rate, uint64_t q_msk);

You are defining an API function specifically for VF. It's not generic and 
shouldn't appear in the API. We now have to be careful about the API and try 
to build a robust generic API which could become stable.

Is it possible to imagine another API where only port and queue parameters are 
required? 

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/4] New library: rte_distributor

2014-05-28 Thread Thomas Monjalon

Hi Bruce,

As for rte_acl, I have some formatting comments.

2014-05-20 11:00, Bruce Richardson:
> This adds a new library to the Intel DPDK whereby a set of packets can be
> distributed one-at-a-time to a set of worker cores, with dynamic load
> balancing being done between those workers. Flows are identified by a tag
> within the mbuf (currently the RSS hash field, 32-bit value), which is used
> to ensure that no two packets of the same flow are processed in parallel,
> thereby preserving ordering.
> 
>  app/test/Makefile  |   2 +
>  app/test/commands.c|   7 +-
>  app/test/test.h|   2 +
>  app/test/test_distributor.c| 582 
> +
>  app/test/test_distributor_perf.c   | 274 
>  config/defconfig_i686-default-linuxapp-gcc |   5 +
>  config/defconfig_i686-default-linuxapp-icc |   5 +
>  config/defconfig_x86_64-default-bsdapp-gcc |   6 +
>  config/defconfig_x86_64-default-linuxapp-gcc   |   5 +
>  config/defconfig_x86_64-default-linuxapp-icc   |   5 +
>  lib/Makefile   |   1 +
>  lib/librte_distributor/Makefile|  50 +++
>  lib/librte_distributor/rte_distributor.c   | 417 ++
>  lib/librte_distributor/rte_distributor.h   | 173 
>  lib/librte_eal/common/include/rte_tailq_elem.h |   2 +
>  mk/rte.app.mk  |   4 +
>  16 files changed, 1539 insertions(+), 1 deletion(-)

As you are introducing a new library, you need to update
doxygen configuration and start page:
doc/doxy-api.conf
doc/doxy-api-index.md

I've run checkpatch.pl from kernel.org on these distributor patches
and it reports some code style issues.
Could you have a look at it please?

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 1/4] acl: Add ACL library (librte_acl) into DPDK.

2014-05-28 Thread Thomas Monjalon

Hi Konstantin,

Glad to see this new library coming in.

2014-05-22 21:48, Konstantin Ananyev:
> The ACL library is used to perform an N-tuple search over a set of rules
> with multiple categories and find the best match for each category.
> 
> Signed-off-by: Konstantin Ananyev 
> ---
>  config/common_linuxapp   |6 +
>  lib/librte_acl/Makefile  |   60 +
>  lib/librte_acl/acl.h |  182 +++
>  lib/librte_acl/acl_bld.c | 2002 
> ++
>  lib/librte_acl/acl_gen.c |  473 
>  lib/librte_acl/acl_run.c |  927 
>  lib/librte_acl/acl_vect.h|  129 +++
>  lib/librte_acl/rte_acl.c |  413 +++
>  lib/librte_acl/rte_acl.h |  453 
>  lib/librte_acl/rte_acl_osdep.h   |   92 ++
>  lib/librte_acl/rte_acl_osdep_alone.h |  277 +
>  lib/librte_acl/tb_mem.c  |  102 ++
>  lib/librte_acl/tb_mem.h  |   73 ++
>  13 files changed, 5189 insertions(+), 0 deletions(-)

As you are introducing a new library, you need to update
doxygen configuration and start page:
doc/doxy-api.conf
doc/doxy-api-index.md

I've run checkpatch.pl from kernel.org on these ACL patches
and it reports a lot of code style issues.
Could you have a look at it please?

Thanks
-- 
Thomas

87 matches

Mail list logo