Re: [ovs-dev] [patch v6 04/10] Userspace datapath: Add fragmentation handling.

2018-05-01 Thread Darrell Ball
The following diff fixes an issue where frag list sorting was not applied
in all cases, as it should have been.
The change just moves the line

ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx);

with some associated indentation changes.

I have some additional private tests that found this, but I need to adapt
them and will add them later.

diff --git a/lib/ipf.c b/lib/ipf.c
index 2963dd5..9cdc130 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -541,7 +541,6 @@ ipf_list_state_transition(struct ipf_list *ipf_list,
bool ff, bool lf,
 break;
 case IPF_LIST_STATE_FIRST_LAST_SEEN:
 next_state = IPF_LIST_STATE_FIRST_LAST_SEEN;
-ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx);
 break;
 case IPF_LIST_STATE_COMPLETED:
 next_state = curr_state;
@@ -552,23 +551,25 @@ ipf_list_state_transition(struct ipf_list *ipf_list,
bool ff, bool lf,
 OVS_NOT_REACHED();
 }

-if (next_state == IPF_LIST_STATE_FIRST_LAST_SEEN &&
-ipf_list_complete(ipf_list)) {
-struct dp_packet *reass_pkt = NULL;
-if (v4) {
-reass_pkt = ipf_reassemble_v4_frags(ipf_list);
-} else {
-reass_pkt = ipf_reassemble_v6_frags(ipf_list);
-}
-if (reass_pkt) {
-struct reassembled_pkt *rp = xzalloc(sizeof *rp);
-rp->pkt = reass_pkt;
-rp->list = ipf_list;
-ipf_reassembled_list_add(rp);
-ipf_expiry_list_remove(ipf_list);
-next_state = IPF_LIST_STATE_COMPLETED;
-} else {
-next_state = IPF_LIST_STATE_REASS_FAIL;
+if (next_state == IPF_LIST_STATE_FIRST_LAST_SEEN) {
+ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx);
+if (ipf_list_complete(ipf_list)) {
+struct dp_packet *reass_pkt = NULL;
+if (v4) {
+reass_pkt = ipf_reassemble_v4_frags(ipf_list);
+} else {
+reass_pkt = ipf_reassemble_v6_frags(ipf_list);
+}
+if (reass_pkt) {
+struct reassembled_pkt *rp = xzalloc(sizeof *rp);
+rp->pkt = reass_pkt;
+rp->list = ipf_list;
+ipf_reassembled_list_add(rp);
+ipf_expiry_list_remove(ipf_list);
+next_state = IPF_LIST_STATE_COMPLETED;
+} else {
+next_state = IPF_LIST_STATE_REASS_FAIL;
+}
 }
 }
 ipf_list->state = next_state;

Darrell


On Sun, Apr 8, 2018 at 7:53 PM, Darrell Ball  wrote:

> Fragmentation handling is added for supporting conntrack.
> Both v4 and v6 are supported.
>
> After discussion with several people, I decided to not store
> configuration state in the database to be more consistent with
> the kernel in future, similarity with other conntrack configuration
> which will not be in the database as well and overall simplicity.
> Accordingly, fragmentation handling is enabled by default.
>
> This patch enables fragmentation tests for the userspace datapath.
>
> Signed-off-by: Darrell Ball 
> ---
>  NEWS |2 +
>  include/sparse/netinet/ip6.h |1 +
>  lib/automake.mk  |2 +
>  lib/conntrack.c  |7 +
>  lib/ipf.c| 1238 ++
> 
>  lib/ipf.h|   63 +++
>  tests/system-traffic.at  |   10 -
>  7 files changed, 1313 insertions(+), 10 deletions(-)
>  create mode 100644 lib/ipf.c
>  create mode 100644 lib/ipf.h
>
> diff --git a/NEWS b/NEWS
> index 0cfcac5..2f31680 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,8 @@ Post-v2.9.0
>   * ovs-ofctl now accepts and display table names in place of
> numbers.  By
> default it always accepts names and in interactive use it displays
> them;
> use --names or --no-names to override.  See ovs-ofctl(8) for
> details.
> +   - Userspace datapath:
> + * Add v4/v6 fragmentation support for conntrack.
> - ovs-vsctl: New commands "add-bond-iface" and "del-bond-iface".
> - OpenFlow:
>   * OFPT_ROLE_STATUS is now available in OpenFlow 1.3.
> diff --git a/include/sparse/netinet/ip6.h b/include/sparse/netinet/ip6.h
> index d2a54de..bfa637a 100644
> --- a/include/sparse/netinet/ip6.h
> +++ b/include/sparse/netinet/ip6.h
> @@ -64,5 +64,6 @@ struct ip6_frag {
>  };
>
>  #define IP6F_OFF_MASK ((OVS_FORCE ovs_be16) 0xfff8)
> +#define IP6F_MORE_FRAG ((OVS_FORCE ovs_be16) 0x0001)
>
>  #endif /* netinet/ip6.h sparse */
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 915a33b..04163b3 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -107,6 +107,8 @@ lib_libopenvswitch_la_SOURCES = \
> lib/hmapx.h \
> lib/id-pool.c \
> lib/id-pool.h \
> +   lib/ipf.c \
> +   lib/ipf.h \
> lib/jhash.c \
> lib/jhash.h \
> lib/json.c \
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 

[ovs-dev] Pricelist May

2018-05-01 Thread Bonesca Mailing


Deze email nieuwsbrief werd in grafisch HTML formaat verzonden.
Als u deze tekstversie ziet, verkiest uw email programma "gewone tekst" emails.
U kan de originele nieuwsbrief online bekijken:
https://ymlpsend3.com/zNDrbw



GB:  In the attachment you can find our latest pricelist!
DE: Im Anlage finden sie unsere Aktuelle Preisliste!
NL: In de bijlage vind u onze meest recente prijslijst!
FR: Dans l'attachement vous pouvez trouver notre liste des prix
nouveau!
ES: En archivo adjunto encontrará nuestro listado de presios!

Our offer online and for more offers click here (
http://bonesca.mamutweb.com/subdet1.htm )

Kind regards, Mit freundlichen Grüssen, Cordialement, Saludos,

Salesteam Bonesca

ne...@bonesca.nl - sales, Dutch, Arab, English, German
maria...@bonesca.nl - sales, Dutch, French, English, German
t...@bonesca.nl - sales, Vietnamese, Thai, Laothioan, German, English
r...@bonesca.nl - sales, Tamil, Dutch, English
poli...@bonesca.nl - sales: Spanish, French, Portuguese, English
corne...@bonesca.nl - office support
ber...@bonesca.nl - purchase/sales, Dutch, German, English

Bonesca Import en Export BV

Schulpengat 9
8321 WC URK
Netherlands
Tel.: +31 (0) 527 70 10 63
Fax: +31 (0) 527 69 04 46
Mail: i...@bonesca.nl
Web: www.bonesca.nl

_
Sign out / Change E-mailadress: 
https://ymlpsend3.com/ughbywyjgsgubbebbgjmyygghewbs
Powered door YourMailingListProvider

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 2/2] ovs-vsctl: Fix segfault when attempting to del-port from parent bridge.

2018-05-01 Thread Flavio Fernandes
The error message in the check for improper bridge param is de-referencing
parent from the wrong bridge. Also, the message itself had the parent and
child bridges reversed, so that got a small tweak as well.

Signed-off-by: Flavio Fernandes 
---
 utilities/ovs-vsctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index c69e89e..188a390 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -1748,9 +1748,9 @@ cmd_del_port(struct ctl_context *ctx)
 if (port->bridge != bridge) {
 if (port->bridge->parent == bridge) {
 ctl_fatal("bridge %s does not have a port %s (although "
-"its parent bridge %s does)",
+"its child bridge %s does)",
 ctx->argv[1], ctx->argv[2],
-bridge->parent->name);
+port->bridge->name);
 } else {
 ctl_fatal("bridge %s does not have a port %s",
 ctx->argv[1], ctx->argv[2]);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 1/2] ovs-vsctl.at: deleting a port from fake bridge

2018-05-01 Thread Flavio Fernandes
This test will exercise the code path in ovs-vsctl where a del-port
is attempted using the parent of a fake-bridge. It expects a message
saying that user provided the wrong bridge. Not a segfault.

Signed-off-by: Flavio Fernandes 
---
 tests/ovs-vsctl.at | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index 3189a9b..f9e7f3b 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -605,6 +605,23 @@ CHECK_PORTS([xapi1], [eth0.$1])
 CHECK_IFACES([xapi1], [eth0.$1])
 OVS_VSCTL_CLEANUP
 AT_CLEANUP
+
+AT_SETUP([simple fake bridge + del-port from parent (VLAN $1)])
+AT_KEYWORDS([ovs-vsctl fake-bridge del-port])
+OVS_VSCTL_SETUP
+OVS_VSCTL_SETUP_SIMPLE_FAKE_CONF([$1])
+AT_CHECK([RUN_OVS_VSCTL([del-port xenbr0 eth0.$1])], [1], [],
+ [ovs-vsctl: bridge xenbr0 does not have a port eth0.$1 (although its child 
bridge xapi1 does)
+])
+CHECK_PORTS([xenbr0], [eth0])
+CHECK_IFACES([xenbr0], [eth0])
+CHECK_PORTS([xapi1], [eth0.$1])
+CHECK_IFACES([xapi1], [eth0.$1])
+AT_CHECK([RUN_OVS_VSCTL([del-port xapi1 eth0.$1])])
+CHECK_PORTS([xenbr0], [eth0])
+CHECK_IFACES([xenbr0], [eth0])
+OVS_VSCTL_CLEANUP
+AT_CLEANUP
 ]) # OVS_VSCTL_FAKE_BRIDGE_TESTS
 
 OVS_VSCTL_FAKE_BRIDGE_TESTS([9])
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 0/2] ovs-vsctl segfaults on del-port of fake-bridge when parent is provided

2018-05-01 Thread Flavio Fernandes
Greetings!

I was lucky enough to spend some time having fun with OVS recently and 
encountered
a bug that may be worth sharing with you.

The code path in the error case when bridge's del-port is attempted for the
parent (instead of the fake-bridge) segfaults. Here are some simple steps for
reproducing this issue:

./boot.sh ; ./configure ; make -j4 ; make sandbox

PARENT_BRIDGE=br0 ; FAKE_BRIDGE=br0c ; VLAN_TAG=666
ovs-vsctl add-br ${PARENT_BRIDGE}
ovs-vsctl add-br $FAKE_BRIDGE $PARENT_BRIDGE $VLAN_TAG

# Add a port to parent bridge, which happens to have same tag as the fake_bridge
# Note: The port could be have been added directly to the fake bridge too, of 
course.
# The end result of the add-port is the same.
ovs-vsctl add-port $PARENT_BRIDGE p1 -- set port p1 tag=${VLAN_TAG}
# ovs-vsctl add-port $FAKE_BRIDGE p1

# removing p1 will make a segfault
ovs-vsctl del-port $PARENT_BRIDGE p1  ; # sad panda moment

# Here are 3 ways of working around this segfault

# workaround 1: remove tag before removing port from parent
ovs-vsctl remove port p1 tag $VLAN_TAG && \
ovs-vsctl del-port $PARENT_BRIDGE p1  && echo ok
# workaround 2: remove port as if it belongs to fake bridge
ovs-vsctl del-port $FAKE_BRIDGE p1  && echo ok
# workaround 3: remove port w/out specifying a bridge
ovs-vsctl del-port p1  && echo ok


This issue appears to exist since commit 7c79588e , which dates back to 
Feb/2010.
I see it in OVS 2.3.2.

-- flaviof



More details about the segfault:

|main: Ubuntu ~/ovs.git/_ffbuilddir/utilities/sandbox on devel
$ sudo ovs-vsctl show
fa096c6f-8f5f-49ae-92b4-e94ce58aceec
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "p1"
tag: 666
Interface "p1"
Port "br0c"
tag: 666
Interface "br0c"
type: internal
|main: Ubuntu ~/ovs.git/_ffbuilddir/utilities/sandbox on devel
$ gdb /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl
...
Reading symbols from /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl...done.
(gdb) run del-port br0 p1
Starting program: /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl del-port br0 
p1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0040a945 in cmd_del_port (ctx=0x7fffc5d0) at 
../utilities/ovs-vsctl.c:1750
1750ctl_fatal("bridge %s does not have a port %s 
(although "
(gdb) bt
#0  0x0040a945 in cmd_del_port (ctx=0x7fffc5d0) at 
../utilities/ovs-vsctl.c:1750
#1  0x004075dd in do_vsctl (args=0xc8f7f0 
"/home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl del-port br0 p1", 
commands=0xc8fd50, n_commands=1, idl=0xc8fdb0)
at ../utilities/ovs-vsctl.c:2623
#2  0x00405e7e in main (argc=4, argv=0x7fffc868) at 
../utilities/ovs-vsctl.c:184
(gdb) list
1745struct vsctl_bridge *bridge;
1746
1747bridge = find_bridge(vsctl_ctx, ctx->argv[1], true);
1748if (port->bridge != bridge) {
1749if (port->bridge->parent == bridge) {
1750ctl_fatal("bridge %s does not have a port %s 
(although "
1751"its parent bridge %s does)",
1752ctx->argv[1], ctx->argv[2],
1753bridge->parent->name);
1754} else {
(gdb) p ctx->argv[1]
$1 = 0x7fffcba5 "br0"
(gdb) p ctx->argv[2]
$2 = 0x7fffcba9 "p1"
(gdb) p port->bridge->name
$3 = 0xc9a290 "br0c"
(gdb) p bridge
$4 = (struct vsctl_bridge *) 0xcc9d60
(gdb) p bridge->parent
$5 = (struct vsctl_bridge *) 0x0
(gdb) p port->bridge
$6 = (struct vsctl_bridge *) 0xccb440
(gdb) p bridge->name
$7 = 0xcc8da0 "br0"
(gdb)


Flavio Fernandes (2):
  ovs-vsctl.at: deleting a port from fake bridge
  ovs-vsctl: Fix segfault when attempting to del-port from parent
bridge.

 tests/ovs-vsctl.at| 17 +
 utilities/ovs-vsctl.c |  4 ++--
 2 files changed, 19 insertions(+), 2 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] rhel: user/group openvswitch does not exist

2018-05-01 Thread Russell Bryant
On Mon, Apr 30, 2018 at 3:27 PM, Aaron Conole  wrote:
> Markos Chandras  writes:
>
>> On 19/04/18 16:27, Aaron Conole wrote:
>>> From: Alan Pevec 
>>>
>>> Default ownership[1] for config files is failing on an empty system:
>>>   Running scriptlet: openvswitch-2.9.0-3.fc28.x86_64
>>> warning: user openvswitch does not exist - using root
>>> warning: group openvswitch does not exist - using root
>>> ...
>>>
>>> Required user/group need to be created in %pre as documented in
>>> Fedora guideline[2]
>>>
>>> [1]
>>> https://github.com/openvswitch/ovs/commit/951d79e638ecdb3b1dcd19df1adb2ff91fe61af8
>>>
>>> [2] 
>>> https://fedoraproject.org/wiki/Packaging:UsersAndGroups#Dynamic_allocation
>>>
>>> Submitted-at: https://github.com/openvswitch/ovs/pull/223
>>> Signed-off-by: Alan Pevec 
>>> Co-authored-by: Aaron Conole 
>>> Signed-off-by: Aaron Conole 
>>
>> Reviewed-by: Markos Chandras 
>
> Thanks Markos.
>
> Timothy, Russell, sorry I forgot to CC you, it seems.

Thanks, applied to master and branch-2.9.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo frames

2018-05-01 Thread Tiago Lam
From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 NEWS |  1 +
 lib/dpdk.c   |  7 +++
 lib/netdev-dpdk.c| 52 +---
 lib/netdev-dpdk.h|  1 +
 vswitchd/vswitch.xml | 20 
 5 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index d22ad14..e6752d6 100644
--- a/NEWS
+++ b/NEWS
@@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018
pmd assignments.
  * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
  * Add support for vHost dequeue zero copy (experimental)
+ * Add support for multi-segment mbufs
- Userspace datapath:
  * Output packet batching support.
- vswitchd:
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 00dd974..1447724 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config)
 
 /* Finally, register the dpdk classes */
 netdev_dpdk_register();
+
+bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+"dpdk-multi-seg-mbufs", false);
+if (multi_seg_mbufs_enable) {
+VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+netdev_dpdk_multi_segment_mbufs_enable();
+}
 }
 
 void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4c6a3c0..5746ae0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
 VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
 
 #define DPDK_PORT_WATCHDOG_INTERVAL 5
 
@@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t 
mbuf_pkt_data_len)
   + dev->requested_n_txq * dev->requested_txq_size
   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
   + MIN_NB_MBUF;
+/* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
 
 ovs_mutex_lock(_mp_mutex);
 do {
@@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp)
 
 /* Tries to allocate a new mempool - or re-use an existing one where
  * appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ *   together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ *   non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ *   fully accomdate packets of size 'requested_mtu'.
  * On success - or when re-using an existing mempool - the new configuration
  * will be applied.
  * On error, device will be left unchanged. */
@@ -701,10 +709,18 @@ static int
 netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
 {
-uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+uint16_t buf_size = 0;
 struct rte_mempool *mp;
 int ret = 0;
 
+/* Contiguous mbufs in use - permit oversized mbufs */
+if (!dpdk_multi_segment_mbufs) {
+buf_size = 

[ovs-dev] [RFC v5 7/8] netdev-dpdk: copy large packet to multi-seg. mbufs

2018-05-01 Thread Tiago Lam
From: Mark Kavanagh 

Currently, packets are only copied to a single segment in
the function dpdk_do_tx_copy(). This could be an issue in
the case of jumbo frames, particularly when multi-segment
mbufs are involved.

This patch calculates the number of segments needed by a
packet and copies the data to each segment.

Co-authored-by: Michael Qiu 
Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Michael Qiu 
Signed-off-by: Tiago Lam 
---
 lib/netdev-dpdk.c | 78 ---
 1 file changed, 68 insertions(+), 10 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c9de742..4c6a3c0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2101,6 +2101,71 @@ out:
 }
 }
 
+static int
+dpdk_prep_tx_buf(struct dp_packet *packet, struct rte_mbuf **head,
+ struct rte_mempool *mp)
+{
+struct rte_mbuf *temp;
+uint32_t size = dp_packet_size(packet);
+uint16_t max_data_len, data_len;
+uint32_t nb_segs = 0;
+int i;
+
+temp = *head = rte_pktmbuf_alloc(mp);
+if (OVS_UNLIKELY(!temp)) {
+return 1;
+}
+
+/* All new allocated mbuf's max data len is the same */
+max_data_len = temp->buf_len - temp->data_off;
+
+/* Calculate # of output mbufs. */
+nb_segs = size / max_data_len;
+if (size % max_data_len) {
+nb_segs = nb_segs + 1;
+}
+
+/* Allocate additional mbufs when multiple output mbufs required. */
+for (i = 1; i < nb_segs; i++) {
+temp->next = rte_pktmbuf_alloc(mp);
+if (!temp->next) {
+rte_pktmbuf_free(*head);
+*head = NULL;
+break;
+}
+temp = temp->next;
+}
+/* We have to do a copy for now */
+rte_pktmbuf_pkt_len(*head) = size;
+temp = *head;
+
+data_len = size < max_data_len ? size: max_data_len;
+if (packet->source == DPBUF_DPDK) {
+*head = &(packet->mbuf);
+while (temp && head && size > 0) {
+rte_memcpy(rte_pktmbuf_mtod(temp, void *),
+dp_packet_data((struct dp_packet *)head), data_len);
+rte_pktmbuf_data_len(temp) = data_len;
+*head = (*head)->next;
+size = size - data_len;
+data_len =  size < max_data_len ? size: max_data_len;
+temp = temp->next;
+}
+} else {
+int offset = 0;
+while (temp && size > 0) {
+memcpy(rte_pktmbuf_mtod(temp, void *),
+dp_packet_at(packet, offset, data_len), data_len);
+rte_pktmbuf_data_len(temp) = data_len;
+temp = temp->next;
+size = size - data_len;
+offset += data_len;
+data_len = size < max_data_len ? size: max_data_len;
+}
+}
+return 0;
+}
+
 /* Tx function. Transmit packets indefinitely */
 static void
 dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2117,6 +2182,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
 uint32_t cnt = batch_cnt;
 uint32_t dropped = 0;
+uint32_t i;
 
 if (dev->type != DPDK_DEV_VHOST) {
 /* Check if QoS has been configured for this netdev. */
@@ -2127,27 +2193,19 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 
 uint32_t txcnt = 0;
 
-for (uint32_t i = 0; i < cnt; i++) {
+for (i = 0; i < cnt; i++) {
 struct dp_packet *packet = batch->packets[i];
 uint32_t size = dp_packet_size(packet);
-
 if (OVS_UNLIKELY(size > dev->max_packet_len)) {
 VLOG_WARN_RL(, "Too big size %u max_packet_len %d",
  size, dev->max_packet_len);
-
 dropped++;
 continue;
 }
-
-pkts[txcnt] = rte_pktmbuf_alloc(dev->mp);
-if (OVS_UNLIKELY(!pkts[txcnt])) {
+if (!dpdk_prep_tx_buf(packet, [txcnt], dev->mp)) {
 dropped += cnt - i;
 break;
 }
-
-/* We have to do a copy for now */
-memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
-   dp_packet_data(packet), size);
 dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
 dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 6/8] dp-packet: copy data from multi-seg. DPDK mbuf

2018-05-01 Thread Tiago Lam
From: Michael Qiu 

When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's
data one by one.

Co-authored-by: Mark Kavanagh 
Co-authored-by: Tiago Lam 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.c | 55 +++
 1 file changed, 47 insertions(+), 8 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index a2793f7..85db57a 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -175,6 +175,49 @@ dp_packet_clone(const struct dp_packet *buffer)
 return dp_packet_clone_with_headroom(buffer, 0);
 }
 
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *buffer,
+  size_t headroom) {
+struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(buffer);
+
+/* copy multi-seg data */
+if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) {
+uint32_t offset = 0;
+void *dst = NULL;
+struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *,
+&(buffer->mbuf));
+
+new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+dp_packet_set_size(new_buffer, pkt_len + headroom);
+dst = dp_packet_tail(new_buffer);
+
+while (tmbuf) {
+rte_memcpy((char *)dst + offset,
+   rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len);
+offset += tmbuf->data_len;
+tmbuf = tmbuf->next;
+}
+} else {
+new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
+pkt_len, headroom);
+}
+
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(_buffer->l2_pad_size, >l2_pad_size,
+sizeof(struct dp_packet) -
+offsetof(struct dp_packet, l2_pad_size));
+
+dp_packet_copy_mbuf_flags(new_buffer, buffer);
+if (dp_packet_rss_valid(new_buffer)) {
+new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
+}
+
+return new_buffer;
+}
+#else
 /* Creates and returns a new dp_packet whose data are copied from 'buffer'.
  * The returned dp_packet will additionally have 'headroom' bytes of
  * headroom. */
@@ -182,29 +225,25 @@ struct dp_packet *
 dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
 {
 struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(buffer);
 
 new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
 /* Copy the following fields into the returned buffer: l2_pad_size,
  * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
 memcpy(_buffer->l2_pad_size, >l2_pad_size,
 sizeof(struct dp_packet) -
 offsetof(struct dp_packet, l2_pad_size));
 
-#ifdef DPDK_NETDEV
-dp_packet_copy_mbuf_flags(new_buffer, buffer);
-if (dp_packet_rss_valid(new_buffer)) {
-new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
 new_buffer->rss_hash_valid = buffer->rss_hash_valid;
 if (dp_packet_rss_valid(new_buffer)) {
 new_buffer->rss_hash = buffer->rss_hash;
-#endif
 }
 
 return new_buffer;
 }
+#endif
 
 /* Creates and returns a new dp_packet that initially contains a copy of the
  * 'size' bytes of data starting at 'data' with no headroom or tailroom. */
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 5/8] dp-packet: copy mbuf info for packet copy

2018-05-01 Thread Tiago Lam
From: Michael Qiu 

Currently, when doing packet copy, lots of DPDK mbuf's info
will be missed, like packet type, ol_flags, etc.
Those information is very important for DPDK to do
packets processing.

Co-authored-by: Mark Kavanagh 
[mark.b.kavan...@intel.com rebased]
Co-authored-by: Tiago Lam 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.c   | 25 +++--
 lib/dp-packet.h   |  3 +++
 lib/netdev-dpdk.c |  1 +
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index fd9fad0..a2793f7 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t 
allocated,
 dp_packet_init__(b, allocated, source);
 }
 
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) {
+ovs_assert(dst != NULL && src != NULL);
+struct rte_mbuf *buf_dst = &(dst->mbuf);
+struct rte_mbuf buf_src = src->mbuf;
+
+buf_dst->nb_segs = buf_src.nb_segs;
+buf_dst->ol_flags = buf_src.ol_flags;
+buf_dst->packet_type = buf_src.packet_type;
+buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should be the first byte of a region
  * obtained from malloc().  It will be freed (with free()) if 'b' is resized or
@@ -177,15 +193,12 @@ dp_packet_clone_with_headroom(const struct dp_packet 
*buffer, size_t headroom)
 offsetof(struct dp_packet, l2_pad_size));
 
 #ifdef DPDK_NETDEV
-new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
-new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
+dp_packet_copy_mbuf_flags(new_buffer, buffer);
 if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
 new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
 #else
+new_buffer->rss_hash_valid = buffer->rss_hash_valid;
+if (dp_packet_rss_valid(new_buffer)) {
 new_buffer->rss_hash = buffer->rss_hash;
 #endif
 }
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 93b0aaf..4607699 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -119,6 +119,9 @@ void dp_packet_init_dpdk(struct dp_packet *, size_t 
allocated);
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
 
+void dp_packet_copy_mbuf_flags(struct dp_packet *dst,
+   const struct dp_packet *src);
+
 struct dp_packet *dp_packet_new(size_t);
 struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom);
 struct dp_packet *dp_packet_clone(const struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7008492..c9de742 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2149,6 +2149,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
dp_packet_data(packet), size);
 dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
+dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
 
 txcnt++;
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 3/8] dp-packet: Add support for multi-seg mbufs

2018-05-01 Thread Tiago Lam
From: Mark Kavanagh 

Some functions in dp-packet assume that the data held by a dp_packet is
contiguous, and perform operations such as pointer arithmetic under that
assumption. However, with the introduction of multi-segment mbufs, where
data is non-contiguous, such assumptions are no longer possible. Thus,
dp_packet_put_init(), dp_packet_shift(), dp_packet_tail(),
dp_packet_end() and dp_packet_at() were modified to take multi-segment
mbufs into account.

Both dp_packet_put_uninit() and dp_packet_shift() are, in their current
implementation, operating on the data buffer of a dp_packet as if it
were contiguous, which in the case of multi-segment mbufs means they
operate on the first mbuf in the chain. However, in the case of
dp_packet_put_uninit(), for example, it is the data length of the last
mbuf in the mbuf chain that should be adjusted. Both functions have thus
been modified to support multi-segment mbufs.

Finally, dp_packet_tail(), dp_packet_end() and dp_packet_at() were also
modified to operate differently when dealing with multi-segment mbufs,
and now iterate over the non-contiguous data buffers for their
calculations.

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.c   |  44 -
 lib/dp-packet.h   | 142 ++
 lib/netdev-dpdk.c |   8 +--
 3 files changed, 147 insertions(+), 47 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..fd9fad0 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -298,10 +298,33 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t 
size)
 /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
  * For example, a 'delta' of 1 would cause each byte of data to move one byte
  * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
- * byte to move one byte backward (from 'p' to 'p-1'). */
+ * byte to move one byte backward (from 'p' to 'p-1').
+ * Note for DPBUF_DPDK(XXX): The shift can only move within a size of RTE_
+ * PKTMBUF_HEADROOM, to either left or right, which is usually defined as 128
+ * bytes.
+ */
 void
 dp_packet_shift(struct dp_packet *b, int delta)
 {
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ovs_assert(delta > 0 ? delta <= dp_packet_headroom(b)
+   : delta < 0 ? -delta <= dp_packet_headroom(b)
+   : true);
+
+if (delta != 0) {
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
+
+if (delta > 0) {
+rte_pktmbuf_prepend(mbuf, delta);
+} else {
+rte_pktmbuf_prepend(mbuf, delta);
+}
+}
+
+return;
+}
+#endif
 ovs_assert(delta > 0 ? delta <= dp_packet_tailroom(b)
: delta < 0 ? -delta <= dp_packet_headroom(b)
: true);
@@ -315,14 +338,31 @@ dp_packet_shift(struct dp_packet *b, int delta)
 
 /* Appends 'size' bytes of data to the tail end of 'b', reallocating and
  * copying its data if necessary.  Returns a pointer to the first byte of the
- * new data, which is left uninitialized. */
+ * new data, which is left uninitialized.
+ * Note for DPBUF_DPDK(XXX): In this case there must be enough tailroom to put
+ * the data in, otherwise this will result in a call to ovs_abort(). */
 void *
 dp_packet_put_uninit(struct dp_packet *b, size_t size)
 {
 void *p;
 dp_packet_prealloc_tailroom(b, size);
 p = dp_packet_tail(b);
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+/* In the case of multi-segment mbufs, the data length of the last mbuf
+ * should be adjusted by 'size' bytes. The packet length of the entire
+ * mbuf chain (stored in the first mbuf of said chain) is adjusted in
+ * the normal execution path below.
+ */
+struct rte_mbuf *buf = &(b->mbuf);
+buf = rte_pktmbuf_lastseg(buf);
+
+buf->data_len += size;
+}
+#endif
+
 dp_packet_set_size(b, dp_packet_size(b) + size);
+
 return p;
 }
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 9bfb7b7..d6512cf 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -55,12 +55,12 @@ struct dp_packet {
 struct rte_mbuf mbuf;   /* DPDK mbuf */
 #else
 void *base_;/* First byte of allocated space. */
-uint16_t allocated_;/* Number of bytes allocated. */
 uint16_t data_ofs;  /* First byte actually in use. */
 uint32_t size_; /* Number of bytes in use. */
 uint32_t rss_hash;  /* Packet hash. */
 bool rss_hash_valid;/* Is the 'rss_hash' valid? */
 #endif
+uint16_t allocated_;/* Number of bytes allocated. */
 enum dp_packet_source source;  /* Source of memory allocated as 'base'. */
 
 /* All the following elements of this struct are 

[ovs-dev] [RFC v5 4/8] dp-packet: Fix data_len issue with multi-seg mbufs

2018-05-01 Thread Tiago Lam
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu 
Co-authored-by: Mark Kavanagh 
Co-authored-by: Przemyslaw Lal 
Co-authored-by: Marcin Ksiadz 
Co-authored-by: Yuanhan Liu 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Przemyslaw Lal 
Signed-off-by: Marcin Ksiadz 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.h | 38 +++---
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index d6512cf..93b0aaf 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -397,17 +397,31 @@ dp_packet_size(const struct dp_packet *b)
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-/* netdev-dpdk does not currently support segmentation; consequently, for
- * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
- * be used interchangably.
- *
- * On the datapath, it is expected that the size of packets
- * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
- * loss of accuracy in assigning 'v' to 'data_len'.
- */
-b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-b->mbuf.pkt_len = v; /* Total length of all segments linked to
-  * this segment. */
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *seg = >mbuf;
+uint16_t pkt_len = v;
+uint16_t seg_len;
+
+/* Trim 'v' length bytes from the end of the chained buffers, freeing
+   any buffers that may be left floating */
+while (seg) {
+seg_len = MIN(pkt_len, seg->data_len);
+seg->data_len = seg_len;
+
+pkt_len -= seg_len;
+if (pkt_len <= 0) {
+/* Free the rest of chained mbufs */
+rte_pktmbuf_free(seg->next);
+seg->next = NULL;
+}
+seg = seg->next;
+}
+} else {
+b->mbuf.data_len = v;
+}
+
+/* Total length of all segments linked to this segment. */
+b->mbuf.pkt_len = v;
 }
 
 static inline uint16_t
@@ -420,6 +434,8 @@ static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
 b->mbuf.data_off = v;
+/* When dealing with DPDK mbufs, keep data_off and data_len in sync */
+b->mbuf.data_len = b->mbuf.buf_len - b->mbuf.data_off;
 }
 
 static inline void *
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 2/8] dp-packet: init specific mbuf fields to 0

2018-05-01 Thread Tiago Lam
From: Mark Kavanagh 

dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.

This patch ensures that the following mbuf fields are initialized to 0,
on creation of a new dp_packet:
   - ol_flags
   - nb_segs
   - tx_offload
   - packet_type

Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 21c8ca5..9bfb7b7 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p 
OVS_UNUSED)
 
 /* This initialization is needed for packets that do not come
  * from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * The DPDK rte library will still otherwise manage the mbuf. */
 static inline void
 dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
 {
 #ifdef DPDK_NETDEV
-p->mbuf.ol_flags = 0;
+struct rte_mbuf *mbuf = &(p->mbuf);
+mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0;
 #endif
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v5 1/8] netdev-dpdk: fix mbuf sizing

2018-05-01 Thread Tiago Lam
From: Mark Kavanagh 

There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
  buffer alignment (typically 1024B). So, for example, in order to
  successfully receive and capture a 1500B packet, mbufs with a
  data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
  * the dp packet, which includes a struct rte mbuf (704B)
  * RTE_PKTMBUF_HEADROOM (128B)
  * packet data (aligned to 1k, as previously described)
  * RTE_PKTMBUF_TAILROOM (typically 0)

Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:

mbuf.data_len == mbuf.buf_len - mbuf.data_off

This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.

Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla 
Co-authored-by: Tiago Lam 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/netdev-dpdk.c | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 3306b19..648a1de 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -82,12 +82,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  + (2 * VLAN_HEADER_LEN))
 #define MTU_TO_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
 #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
-#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\
- - ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu)  ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM),   \
- RTE_CACHE_LINE_SIZE)
 #define NETDEV_DPDK_MBUF_ALIGN  1024
 #define NETDEV_DPDK_MAX_PKT_LEN 9728
 
@@ -486,7 +480,7 @@ is_dpdk_class(const struct netdev_class *class)
  * behaviour, which reduces performance. To prevent this, use a buffer size
  * that is closest to 'mtu', but which satisfies the aforementioned criteria.
  */
-static uint32_t
+static uint16_t
 dpdk_buf_size(int mtu)
 {
 return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM),
@@ -577,7 +571,7 @@ dpdk_mp_do_not_free(struct rte_mempool *mp) 
OVS_REQUIRES(dpdk_mp_mutex)
  *  - a new mempool was just created;
  *  - a matching mempool already exists. */
 static struct rte_mempool *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
+dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
 {
 char mp_name[RTE_MEMPOOL_NAMESIZE];
 const char *netdev_name = netdev_get_name(>up);
@@ -585,6 +579,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
 uint32_t n_mbufs;
 uint32_t hash = hash_string(netdev_name, 0);
 struct rte_mempool *mp = NULL;
+uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len;
 
 /*
  * XXX: rough estimation of number of mbufs required for this port:
@@ -604,12 +599,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
  * longer than RTE_MEMPOOL_NAMESIZE. */
 int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
"ovs%08x%02d%05d%07u",
-   hash, socket_id, mtu, n_mbufs);
+   hash, socket_id, mbuf_pkt_data_len, n_mbufs);
 if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
 VLOG_DBG("snprintf returned %d. "
  "Failed to generate a mempool name for \"%s\". "
- "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
- ret, netdev_name, hash, socket_id, mtu, n_mbufs);
+ "Hash:0x%x, socket_id: %d, pkt data room:%d, mbufs:%u.",
+ ret, netdev_name, hash, socket_id, mbuf_pkt_data_len,
+ n_mbufs);
 break;
 }
 
@@ -618,13 +614,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
   netdev_name, n_mbufs, socket_id,
   

[ovs-dev] [RFC v5 0/8] Support multi-segment mbufs

2018-05-01 Thread Tiago Lam
Overview

This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. mbufs are chained via their 'next'
attribute (an mbuf pointer).

Use Cases
=
i.  Handling oversized (guest-originated) frames, which are marked
for hardware accelration/offload (TSO, for example).

Packets which originate from a non-DPDK source may be marked for
offload; as such, they may be larger than the permitted ingress
interface's MTU, and may be stored in an oversized dp-packet. In
order to transmit such packets over a DPDK port, their contents
must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
its current implementation, that function only copies data into
a single mbuf; if the space available in the mbuf is exhausted,
but not all packet data has been copied, then it is lost.
Similarly, when cloning a DPDK mbuf, it must be considered
whether that mbuf contains multiple segments. Both issues are
resolved within this patchset.

ii. Handling jumbo frames.

While OvS already supports jumbo frames, it does so by increasing
mbuf size, such that the entirety of a jumbo frame may be handled
in a single mbuf. This is certainly the preferred, and most
performant approach (and remains the default). However, it places
high demands on system memory; multi-segment mbufs may be
prefereable for systems which are memory-constrained.

Enabling multi-segment mbufs

Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

---

v5: - Rebased on master e5e22dc ("datapath-windows: Prevent ct-counters
  from getting redundantly incremented");
- Sugesh's comments have been addressed:
  - Changed dp_packet_set_data() and dp_packet_set_size() logic to
make them independent of each other;
  - Dropped patch 3 now that dp_packet_set_data() and dp_packet_set_
size() are independent;
  - dp_packet_clone_with_headroom() now has split functions for
handling DPDK sourced packets and non-DPDK packets;
- Modified various functions in dp-packet.h to account for multi-seg
  mbufs - dp_packet_put_uninit(), dp_packet_tail(), dp_packet_tail()
  and dp_packet_at();
- Added support for shifting packet data in multi-seg mbufs, using
  dp_packet_shift();
- Fixed some minor inconsistencies.

Note that some of the changes in v5 have been contributed by Mark
Kavanaugh as well.

v4: - restructure patchset
- account for 128B ARM cacheline when sizing mbufs

Mark Kavanagh (5):
  netdev-dpdk: fix mbuf sizing
  dp-packet: init specific mbuf fields to 0
  dp-packet: Add support for multi-seg mbufs
  netdev-dpdk: copy large packet to multi-seg. mbufs
  netdev-dpdk: support multi-segment jumbo frames

Michael Qiu (2):
  dp-packet: copy mbuf info for packet copy
  dp-packet: copy data from multi-seg. DPDK mbuf

Tiago Lam (1):
  dp-packet: Fix data_len issue with multi-seg mbufs

 NEWS |   1 +
 lib/dp-packet.c  | 118 
 lib/dp-packet.h  | 189 ---
 lib/dpdk.c   |   7 ++
 lib/netdev-dpdk.c| 183 +++--
 lib/netdev-dpdk.h|   1 +
 vswitchd/vswitch.xml |  20 ++
 7 files changed, 415 insertions(+), 104 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] odp-util: Remove unnecessary TOS ECN bits rewrite for tunnels

2018-05-01 Thread Simon Horman
On Tue, May 01, 2018 at 12:36:06PM +, Jianbo Liu wrote:
> For tunnels, TOS ECN bits are never wildcard for the reason that they
> are always inherited. OVS will create a rewrite action if we add rule
> to modify other IP headers. But it also adds an extra ECN rewrite for
> the action because of this ECN un-wildcarding.
> 
> It seems no error because the ECN bits to be changed are same in this
> case. But as rule can't be offloaded to hardware, the unnecssary ECN
> rewrite should be removed.
> 
> Signed-off-by: Jianbo Liu 
> Reviewed-by: Paul Blakey 
> Reviewed-by: Roi Dayan 

Thanks, applied to mater, branch-2.9 and branch.2.8.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Great Android Application to Manage and Track your Field Executive's

2018-05-01 Thread Ricky
 

 

Hi,

 

I hope you are doing well.  

 


I am delighted to share with you "THE SECRET BEHIND SUCCESSFUL SALES TEAM
MANAGEMENT".

I found this is beneficial to you & your organization.

 

Check: 5 Advantages of Smart Work-flow: 

 

1.  Automate work-force.
2.  Increases transparency. 
3.  Instant communication. 
4.  Improves business efficiency.
5.  Saves time & cost.

I think it's something that your organization might see immediate value in.

 

Looking forward to your response

 

Regards

Ricky

 


Travelize


Address -  #No.5, First Floor, Race Course Road, Bengaluru-560009,
Karnataka, India


Mobile: 7406878981

If you are not interested please reply in subject line as UN-SUBSCRIBE

 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] odp-util: Remove unnecessary TOS ECN bits rewrite for tunnels

2018-05-01 Thread Jianbo Liu
For tunnels, TOS ECN bits are never wildcard for the reason that they
are always inherited. OVS will create a rewrite action if we add rule
to modify other IP headers. But it also adds an extra ECN rewrite for
the action because of this ECN un-wildcarding.

It seems no error because the ECN bits to be changed are same in this
case. But as rule can't be offloaded to hardware, the unnecssary ECN
rewrite should be removed.

Signed-off-by: Jianbo Liu 
Reviewed-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/odp-util.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 6db241a..95c584b 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -6962,6 +6962,11 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
 mask.ipv4_proto = 0;/* Not writeable. */
 mask.ipv4_frag = 0; /* Not writable. */
 
+if (flow_tnl_dst_is_set(_flow->tunnel) &&
+((base_flow->nw_tos ^ flow->nw_tos) & IP_ECN_MASK) == 0) {
+mask.ipv4_tos &= ~IP_ECN_MASK;
+}
+
 if (commit(OVS_KEY_ATTR_IPV4, use_masked, , , , sizeof key,
odp_actions)) {
 put_ipv4_key(, base_flow, false);
@@ -7012,6 +7017,11 @@ commit_set_ipv6_action(const struct flow *flow, struct 
flow *base_flow,
 mask.ipv6_proto = 0;/* Not writeable. */
 mask.ipv6_frag = 0; /* Not writable. */
 
+if (flow_tnl_dst_is_set(_flow->tunnel) &&
+((base_flow->nw_tos ^ flow->nw_tos) & IP_ECN_MASK) == 0) {
+mask.ipv6_tclass &= ~IP_ECN_MASK;
+}
+
 if (commit(OVS_KEY_ATTR_IPV6, use_masked, , , , sizeof key,
odp_actions)) {
 put_ipv6_key(, base_flow, false);
-- 
2.9.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/1] netdev-dpdk: don't enable scatter for jumbo RX support for nfp

2018-05-01 Thread Stokes, Ian
> On 04/27/2018 05:40 PM, Pablo Cascón wrote:
> > Currently to RX jumbo packets fails for NICs not supporting scatter.
> > Scatter is not strictly needed for jumbo RX support. This change fixes
> > the issue by not enabling scatter only for the PMD/NIC known not to
> > need it to support jumbo RX.
> >
> 
> Acked-by: Kevin Traynor 
> 

Thanks all, I'll apply this to DPDK_MERGE and backport to the previous 
releases, it will be part of this week's pull request.

Thanks
Ian

> > Note: this change is temporary and not needed for later releases
> > OVS/DPDK
> >
> > Reported-by: Louis Peens 
> > Signed-off-by: Pablo Cascón 
> > Reviewed-by: Simon Horman 
> > ---
> >  lib/netdev-dpdk.c | 14 +++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > ee39cbe..fdc8f66 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -694,11 +694,19 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev,
> int n_rxq, int n_txq)
> >  int diag = 0;
> >  int i;
> >  struct rte_eth_conf conf = port_conf;
> > +struct rte_eth_dev_info info;
> >
> > -/* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> explicitly
> > - * enabled. */
> > +/* As of DPDK 17.11.1 a few PMDs require to explicitly enable
> > + * scatter to support jumbo RX. Checking the offload capabilities
> > + * is not an option as PMDs are not required yet to report
> > + * them. The only reliable info is the driver name and knowledge
> > + * (testing or code review). Listing all such PMDs feels harder
> > + * than highlighting the one known not to need scatter */
> >  if (dev->mtu > ETHER_MTU) {
> > -conf.rxmode.enable_scatter = 1;
> > +rte_eth_dev_info_get(dev->port_id, );
> > +if (strncmp(info.driver_name, "net_nfp", 6)) {
> > +conf.rxmode.enable_scatter = 1;
> > +}
> >  }
> >
> >  conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
> >
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev