Re: [ovs-dev] [patch v6 04/10] Userspace datapath: Add fragmentation handling.
The following diff fixes an issue where frag list sorting was not applied in all cases, as it should have been. The change just moves the line ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx); with some associated indentation changes. I have some additional private tests that found this, but I need to adapt them and will add them later. diff --git a/lib/ipf.c b/lib/ipf.c index 2963dd5..9cdc130 100644 --- a/lib/ipf.c +++ b/lib/ipf.c @@ -541,7 +541,6 @@ ipf_list_state_transition(struct ipf_list *ipf_list, bool ff, bool lf, break; case IPF_LIST_STATE_FIRST_LAST_SEEN: next_state = IPF_LIST_STATE_FIRST_LAST_SEEN; -ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx); break; case IPF_LIST_STATE_COMPLETED: next_state = curr_state; @@ -552,23 +551,25 @@ ipf_list_state_transition(struct ipf_list *ipf_list, bool ff, bool lf, OVS_NOT_REACHED(); } -if (next_state == IPF_LIST_STATE_FIRST_LAST_SEEN && -ipf_list_complete(ipf_list)) { -struct dp_packet *reass_pkt = NULL; -if (v4) { -reass_pkt = ipf_reassemble_v4_frags(ipf_list); -} else { -reass_pkt = ipf_reassemble_v6_frags(ipf_list); -} -if (reass_pkt) { -struct reassembled_pkt *rp = xzalloc(sizeof *rp); -rp->pkt = reass_pkt; -rp->list = ipf_list; -ipf_reassembled_list_add(rp); -ipf_expiry_list_remove(ipf_list); -next_state = IPF_LIST_STATE_COMPLETED; -} else { -next_state = IPF_LIST_STATE_REASS_FAIL; +if (next_state == IPF_LIST_STATE_FIRST_LAST_SEEN) { +ipf_sort(ipf_list->frag_list, ipf_list->last_inuse_idx); +if (ipf_list_complete(ipf_list)) { +struct dp_packet *reass_pkt = NULL; +if (v4) { +reass_pkt = ipf_reassemble_v4_frags(ipf_list); +} else { +reass_pkt = ipf_reassemble_v6_frags(ipf_list); +} +if (reass_pkt) { +struct reassembled_pkt *rp = xzalloc(sizeof *rp); +rp->pkt = reass_pkt; +rp->list = ipf_list; +ipf_reassembled_list_add(rp); +ipf_expiry_list_remove(ipf_list); +next_state = IPF_LIST_STATE_COMPLETED; +} else { +next_state = IPF_LIST_STATE_REASS_FAIL; +} } } ipf_list->state = next_state; Darrell On Sun, Apr 8, 2018 at 7:53 PM, Darrell Ballwrote: > Fragmentation handling is added for supporting conntrack. > Both v4 and v6 are supported. > > After discussion with several people, I decided to not store > configuration state in the database to be more consistent with > the kernel in future, similarity with other conntrack configuration > which will not be in the database as well and overall simplicity. > Accordingly, fragmentation handling is enabled by default. > > This patch enables fragmentation tests for the userspace datapath. > > Signed-off-by: Darrell Ball > --- > NEWS |2 + > include/sparse/netinet/ip6.h |1 + > lib/automake.mk |2 + > lib/conntrack.c |7 + > lib/ipf.c| 1238 ++ > > lib/ipf.h| 63 +++ > tests/system-traffic.at | 10 - > 7 files changed, 1313 insertions(+), 10 deletions(-) > create mode 100644 lib/ipf.c > create mode 100644 lib/ipf.h > > diff --git a/NEWS b/NEWS > index 0cfcac5..2f31680 100644 > --- a/NEWS > +++ b/NEWS > @@ -10,6 +10,8 @@ Post-v2.9.0 > * ovs-ofctl now accepts and display table names in place of > numbers. By > default it always accepts names and in interactive use it displays > them; > use --names or --no-names to override. See ovs-ofctl(8) for > details. > + - Userspace datapath: > + * Add v4/v6 fragmentation support for conntrack. > - ovs-vsctl: New commands "add-bond-iface" and "del-bond-iface". > - OpenFlow: > * OFPT_ROLE_STATUS is now available in OpenFlow 1.3. > diff --git a/include/sparse/netinet/ip6.h b/include/sparse/netinet/ip6.h > index d2a54de..bfa637a 100644 > --- a/include/sparse/netinet/ip6.h > +++ b/include/sparse/netinet/ip6.h > @@ -64,5 +64,6 @@ struct ip6_frag { > }; > > #define IP6F_OFF_MASK ((OVS_FORCE ovs_be16) 0xfff8) > +#define IP6F_MORE_FRAG ((OVS_FORCE ovs_be16) 0x0001) > > #endif /* netinet/ip6.h sparse */ > diff --git a/lib/automake.mk b/lib/automake.mk > index 915a33b..04163b3 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -107,6 +107,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/hmapx.h \ > lib/id-pool.c \ > lib/id-pool.h \ > + lib/ipf.c \ > + lib/ipf.h \ > lib/jhash.c \ > lib/jhash.h \ > lib/json.c \ > diff --git a/lib/conntrack.c b/lib/conntrack.c > index
[ovs-dev] Pricelist May
Deze email nieuwsbrief werd in grafisch HTML formaat verzonden. Als u deze tekstversie ziet, verkiest uw email programma "gewone tekst" emails. U kan de originele nieuwsbrief online bekijken: https://ymlpsend3.com/zNDrbw GB: In the attachment you can find our latest pricelist! DE: Im Anlage finden sie unsere Aktuelle Preisliste! NL: In de bijlage vind u onze meest recente prijslijst! FR: Dans l'attachement vous pouvez trouver notre liste des prix nouveau! ES: En archivo adjunto encontrará nuestro listado de presios! Our offer online and for more offers click here ( http://bonesca.mamutweb.com/subdet1.htm ) Kind regards, Mit freundlichen Grüssen, Cordialement, Saludos, Salesteam Bonesca ne...@bonesca.nl - sales, Dutch, Arab, English, German maria...@bonesca.nl - sales, Dutch, French, English, German t...@bonesca.nl - sales, Vietnamese, Thai, Laothioan, German, English r...@bonesca.nl - sales, Tamil, Dutch, English poli...@bonesca.nl - sales: Spanish, French, Portuguese, English corne...@bonesca.nl - office support ber...@bonesca.nl - purchase/sales, Dutch, German, English Bonesca Import en Export BV Schulpengat 9 8321 WC URK Netherlands Tel.: +31 (0) 527 70 10 63 Fax: +31 (0) 527 69 04 46 Mail: i...@bonesca.nl Web: www.bonesca.nl _ Sign out / Change E-mailadress: https://ymlpsend3.com/ughbywyjgsgubbebbgjmyygghewbs Powered door YourMailingListProvider ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v1 2/2] ovs-vsctl: Fix segfault when attempting to del-port from parent bridge.
The error message in the check for improper bridge param is de-referencing parent from the wrong bridge. Also, the message itself had the parent and child bridges reversed, so that got a small tweak as well. Signed-off-by: Flavio Fernandes--- utilities/ovs-vsctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c index c69e89e..188a390 100644 --- a/utilities/ovs-vsctl.c +++ b/utilities/ovs-vsctl.c @@ -1748,9 +1748,9 @@ cmd_del_port(struct ctl_context *ctx) if (port->bridge != bridge) { if (port->bridge->parent == bridge) { ctl_fatal("bridge %s does not have a port %s (although " -"its parent bridge %s does)", +"its child bridge %s does)", ctx->argv[1], ctx->argv[2], -bridge->parent->name); +port->bridge->name); } else { ctl_fatal("bridge %s does not have a port %s", ctx->argv[1], ctx->argv[2]); -- 1.9.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v1 1/2] ovs-vsctl.at: deleting a port from fake bridge
This test will exercise the code path in ovs-vsctl where a del-port is attempted using the parent of a fake-bridge. It expects a message saying that user provided the wrong bridge. Not a segfault. Signed-off-by: Flavio Fernandes--- tests/ovs-vsctl.at | 17 + 1 file changed, 17 insertions(+) diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at index 3189a9b..f9e7f3b 100644 --- a/tests/ovs-vsctl.at +++ b/tests/ovs-vsctl.at @@ -605,6 +605,23 @@ CHECK_PORTS([xapi1], [eth0.$1]) CHECK_IFACES([xapi1], [eth0.$1]) OVS_VSCTL_CLEANUP AT_CLEANUP + +AT_SETUP([simple fake bridge + del-port from parent (VLAN $1)]) +AT_KEYWORDS([ovs-vsctl fake-bridge del-port]) +OVS_VSCTL_SETUP +OVS_VSCTL_SETUP_SIMPLE_FAKE_CONF([$1]) +AT_CHECK([RUN_OVS_VSCTL([del-port xenbr0 eth0.$1])], [1], [], + [ovs-vsctl: bridge xenbr0 does not have a port eth0.$1 (although its child bridge xapi1 does) +]) +CHECK_PORTS([xenbr0], [eth0]) +CHECK_IFACES([xenbr0], [eth0]) +CHECK_PORTS([xapi1], [eth0.$1]) +CHECK_IFACES([xapi1], [eth0.$1]) +AT_CHECK([RUN_OVS_VSCTL([del-port xapi1 eth0.$1])]) +CHECK_PORTS([xenbr0], [eth0]) +CHECK_IFACES([xenbr0], [eth0]) +OVS_VSCTL_CLEANUP +AT_CLEANUP ]) # OVS_VSCTL_FAKE_BRIDGE_TESTS OVS_VSCTL_FAKE_BRIDGE_TESTS([9]) -- 1.9.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v1 0/2] ovs-vsctl segfaults on del-port of fake-bridge when parent is provided
Greetings! I was lucky enough to spend some time having fun with OVS recently and encountered a bug that may be worth sharing with you. The code path in the error case when bridge's del-port is attempted for the parent (instead of the fake-bridge) segfaults. Here are some simple steps for reproducing this issue: ./boot.sh ; ./configure ; make -j4 ; make sandbox PARENT_BRIDGE=br0 ; FAKE_BRIDGE=br0c ; VLAN_TAG=666 ovs-vsctl add-br ${PARENT_BRIDGE} ovs-vsctl add-br $FAKE_BRIDGE $PARENT_BRIDGE $VLAN_TAG # Add a port to parent bridge, which happens to have same tag as the fake_bridge # Note: The port could be have been added directly to the fake bridge too, of course. # The end result of the add-port is the same. ovs-vsctl add-port $PARENT_BRIDGE p1 -- set port p1 tag=${VLAN_TAG} # ovs-vsctl add-port $FAKE_BRIDGE p1 # removing p1 will make a segfault ovs-vsctl del-port $PARENT_BRIDGE p1 ; # sad panda moment # Here are 3 ways of working around this segfault # workaround 1: remove tag before removing port from parent ovs-vsctl remove port p1 tag $VLAN_TAG && \ ovs-vsctl del-port $PARENT_BRIDGE p1 && echo ok # workaround 2: remove port as if it belongs to fake bridge ovs-vsctl del-port $FAKE_BRIDGE p1 && echo ok # workaround 3: remove port w/out specifying a bridge ovs-vsctl del-port p1 && echo ok This issue appears to exist since commit 7c79588e , which dates back to Feb/2010. I see it in OVS 2.3.2. -- flaviof More details about the segfault: |main: Ubuntu ~/ovs.git/_ffbuilddir/utilities/sandbox on devel $ sudo ovs-vsctl show fa096c6f-8f5f-49ae-92b4-e94ce58aceec Bridge "br0" Port "br0" Interface "br0" type: internal Port "p1" tag: 666 Interface "p1" Port "br0c" tag: 666 Interface "br0c" type: internal |main: Ubuntu ~/ovs.git/_ffbuilddir/utilities/sandbox on devel $ gdb /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl ... Reading symbols from /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl...done. (gdb) run del-port br0 p1 Starting program: /home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl del-port br0 p1 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x0040a945 in cmd_del_port (ctx=0x7fffc5d0) at ../utilities/ovs-vsctl.c:1750 1750ctl_fatal("bridge %s does not have a port %s (although " (gdb) bt #0 0x0040a945 in cmd_del_port (ctx=0x7fffc5d0) at ../utilities/ovs-vsctl.c:1750 #1 0x004075dd in do_vsctl (args=0xc8f7f0 "/home/ff/ovs.git/_ffbuilddir/utilities/ovs-vsctl del-port br0 p1", commands=0xc8fd50, n_commands=1, idl=0xc8fdb0) at ../utilities/ovs-vsctl.c:2623 #2 0x00405e7e in main (argc=4, argv=0x7fffc868) at ../utilities/ovs-vsctl.c:184 (gdb) list 1745struct vsctl_bridge *bridge; 1746 1747bridge = find_bridge(vsctl_ctx, ctx->argv[1], true); 1748if (port->bridge != bridge) { 1749if (port->bridge->parent == bridge) { 1750ctl_fatal("bridge %s does not have a port %s (although " 1751"its parent bridge %s does)", 1752ctx->argv[1], ctx->argv[2], 1753bridge->parent->name); 1754} else { (gdb) p ctx->argv[1] $1 = 0x7fffcba5 "br0" (gdb) p ctx->argv[2] $2 = 0x7fffcba9 "p1" (gdb) p port->bridge->name $3 = 0xc9a290 "br0c" (gdb) p bridge $4 = (struct vsctl_bridge *) 0xcc9d60 (gdb) p bridge->parent $5 = (struct vsctl_bridge *) 0x0 (gdb) p port->bridge $6 = (struct vsctl_bridge *) 0xccb440 (gdb) p bridge->name $7 = 0xcc8da0 "br0" (gdb) Flavio Fernandes (2): ovs-vsctl.at: deleting a port from fake bridge ovs-vsctl: Fix segfault when attempting to del-port from parent bridge. tests/ovs-vsctl.at| 17 + utilities/ovs-vsctl.c | 4 ++-- 2 files changed, 19 insertions(+), 2 deletions(-) -- 1.9.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] rhel: user/group openvswitch does not exist
On Mon, Apr 30, 2018 at 3:27 PM, Aaron Conolewrote: > Markos Chandras writes: > >> On 19/04/18 16:27, Aaron Conole wrote: >>> From: Alan Pevec >>> >>> Default ownership[1] for config files is failing on an empty system: >>> Running scriptlet: openvswitch-2.9.0-3.fc28.x86_64 >>> warning: user openvswitch does not exist - using root >>> warning: group openvswitch does not exist - using root >>> ... >>> >>> Required user/group need to be created in %pre as documented in >>> Fedora guideline[2] >>> >>> [1] >>> https://github.com/openvswitch/ovs/commit/951d79e638ecdb3b1dcd19df1adb2ff91fe61af8 >>> >>> [2] >>> https://fedoraproject.org/wiki/Packaging:UsersAndGroups#Dynamic_allocation >>> >>> Submitted-at: https://github.com/openvswitch/ovs/pull/223 >>> Signed-off-by: Alan Pevec >>> Co-authored-by: Aaron Conole >>> Signed-off-by: Aaron Conole >> >> Reviewed-by: Markos Chandras > > Thanks Markos. > > Timothy, Russell, sorry I forgot to CC you, it seems. Thanks, applied to master and branch-2.9. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo frames
From: Mark KavanaghCurrently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- NEWS | 1 + lib/dpdk.c | 7 +++ lib/netdev-dpdk.c| 52 +--- lib/netdev-dpdk.h| 1 + vswitchd/vswitch.xml | 20 5 files changed, 74 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index d22ad14..e6752d6 100644 --- a/NEWS +++ b/NEWS @@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018 pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. * Add support for vHost dequeue zero copy (experimental) + * Add support for multi-segment mbufs - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/dpdk.c b/lib/dpdk.c index 00dd974..1447724 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + +bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, +"dpdk-multi-seg-mbufs", false); +if (multi_seg_mbufs_enable) { +VLOG_INFO("DPDK multi-segment mbufs enabled\n"); +netdev_dpdk_multi_segment_mbufs_enable(); +} } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4c6a3c0..5746ae0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; +/* XXX: should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(_mp_mutex); do { @@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to + * fully accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -701,10 +709,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { -uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); +uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; +/* Contiguous mbufs in use - permit oversized mbufs */ +if (!dpdk_multi_segment_mbufs) { +buf_size =
[ovs-dev] [RFC v5 7/8] netdev-dpdk: copy large packet to multi-seg. mbufs
From: Mark KavanaghCurrently, packets are only copied to a single segment in the function dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames, particularly when multi-segment mbufs are involved. This patch calculates the number of segments needed by a packet and copies the data to each segment. Co-authored-by: Michael Qiu Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 78 --- 1 file changed, 68 insertions(+), 10 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c9de742..4c6a3c0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2101,6 +2101,71 @@ out: } } +static int +dpdk_prep_tx_buf(struct dp_packet *packet, struct rte_mbuf **head, + struct rte_mempool *mp) +{ +struct rte_mbuf *temp; +uint32_t size = dp_packet_size(packet); +uint16_t max_data_len, data_len; +uint32_t nb_segs = 0; +int i; + +temp = *head = rte_pktmbuf_alloc(mp); +if (OVS_UNLIKELY(!temp)) { +return 1; +} + +/* All new allocated mbuf's max data len is the same */ +max_data_len = temp->buf_len - temp->data_off; + +/* Calculate # of output mbufs. */ +nb_segs = size / max_data_len; +if (size % max_data_len) { +nb_segs = nb_segs + 1; +} + +/* Allocate additional mbufs when multiple output mbufs required. */ +for (i = 1; i < nb_segs; i++) { +temp->next = rte_pktmbuf_alloc(mp); +if (!temp->next) { +rte_pktmbuf_free(*head); +*head = NULL; +break; +} +temp = temp->next; +} +/* We have to do a copy for now */ +rte_pktmbuf_pkt_len(*head) = size; +temp = *head; + +data_len = size < max_data_len ? size: max_data_len; +if (packet->source == DPBUF_DPDK) { +*head = &(packet->mbuf); +while (temp && head && size > 0) { +rte_memcpy(rte_pktmbuf_mtod(temp, void *), +dp_packet_data((struct dp_packet *)head), data_len); +rte_pktmbuf_data_len(temp) = data_len; +*head = (*head)->next; +size = size - data_len; +data_len = size < max_data_len ? size: max_data_len; +temp = temp->next; +} +} else { +int offset = 0; +while (temp && size > 0) { +memcpy(rte_pktmbuf_mtod(temp, void *), +dp_packet_at(packet, offset, data_len), data_len); +rte_pktmbuf_data_len(temp) = data_len; +temp = temp->next; +size = size - data_len; +offset += data_len; +data_len = size < max_data_len ? size: max_data_len; +} +} +return 0; +} + /* Tx function. Transmit packets indefinitely */ static void dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) @@ -2117,6 +2182,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; uint32_t cnt = batch_cnt; uint32_t dropped = 0; +uint32_t i; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -2127,27 +2193,19 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t txcnt = 0; -for (uint32_t i = 0; i < cnt; i++) { +for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; uint32_t size = dp_packet_size(packet); - if (OVS_UNLIKELY(size > dev->max_packet_len)) { VLOG_WARN_RL(, "Too big size %u max_packet_len %d", size, dev->max_packet_len); - dropped++; continue; } - -pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); -if (OVS_UNLIKELY(!pkts[txcnt])) { +if (!dpdk_prep_tx_buf(packet, [txcnt], dev->mp)) { dropped += cnt - i; break; } - -/* We have to do a copy for now */ -memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), - dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 6/8] dp-packet: copy data from multi-seg. DPDK mbuf
From: Michael QiuWhen doing packet clone, if packet source is from DPDK driver, multi-segment must be considered, and copy the segment's data one by one. Co-authored-by: Mark Kavanagh Co-authored-by: Tiago Lam Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 55 +++ 1 file changed, 47 insertions(+), 8 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index a2793f7..85db57a 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -175,6 +175,49 @@ dp_packet_clone(const struct dp_packet *buffer) return dp_packet_clone_with_headroom(buffer, 0); } +#ifdef DPDK_NETDEV +struct dp_packet * +dp_packet_clone_with_headroom(const struct dp_packet *buffer, + size_t headroom) { +struct dp_packet *new_buffer; +uint32_t pkt_len = dp_packet_size(buffer); + +/* copy multi-seg data */ +if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) { +uint32_t offset = 0; +void *dst = NULL; +struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, +&(buffer->mbuf)); + +new_buffer = dp_packet_new_with_headroom(pkt_len, headroom); +dp_packet_set_size(new_buffer, pkt_len + headroom); +dst = dp_packet_tail(new_buffer); + +while (tmbuf) { +rte_memcpy((char *)dst + offset, + rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len); +offset += tmbuf->data_len; +tmbuf = tmbuf->next; +} +} else { +new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), +pkt_len, headroom); +} + +/* Copy the following fields into the returned buffer: l2_pad_size, + * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ +memcpy(_buffer->l2_pad_size, >l2_pad_size, +sizeof(struct dp_packet) - +offsetof(struct dp_packet, l2_pad_size)); + +dp_packet_copy_mbuf_flags(new_buffer, buffer); +if (dp_packet_rss_valid(new_buffer)) { +new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; +} + +return new_buffer; +} +#else /* Creates and returns a new dp_packet whose data are copied from 'buffer'. * The returned dp_packet will additionally have 'headroom' bytes of * headroom. */ @@ -182,29 +225,25 @@ struct dp_packet * dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) { struct dp_packet *new_buffer; +uint32_t pkt_len = dp_packet_size(buffer); new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), - dp_packet_size(buffer), - headroom); + pkt_len, headroom); + /* Copy the following fields into the returned buffer: l2_pad_size, * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ memcpy(_buffer->l2_pad_size, >l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); -#ifdef DPDK_NETDEV -dp_packet_copy_mbuf_flags(new_buffer, buffer); -if (dp_packet_rss_valid(new_buffer)) { -new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; -#else new_buffer->rss_hash_valid = buffer->rss_hash_valid; if (dp_packet_rss_valid(new_buffer)) { new_buffer->rss_hash = buffer->rss_hash; -#endif } return new_buffer; } +#endif /* Creates and returns a new dp_packet that initially contains a copy of the * 'size' bytes of data starting at 'data' with no headroom or tailroom. */ -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 5/8] dp-packet: copy mbuf info for packet copy
From: Michael QiuCurrently, when doing packet copy, lots of DPDK mbuf's info will be missed, like packet type, ol_flags, etc. Those information is very important for DPDK to do packets processing. Co-authored-by: Mark Kavanagh [mark.b.kavan...@intel.com rebased] Co-authored-by: Tiago Lam Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 25 +++-- lib/dp-packet.h | 3 +++ lib/netdev-dpdk.c | 1 + 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index fd9fad0..a2793f7 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t allocated, dp_packet_init__(b, allocated, source); } +#ifdef DPDK_NETDEV +void +dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) { +ovs_assert(dst != NULL && src != NULL); +struct rte_mbuf *buf_dst = &(dst->mbuf); +struct rte_mbuf buf_src = src->mbuf; + +buf_dst->nb_segs = buf_src.nb_segs; +buf_dst->ol_flags = buf_src.ol_flags; +buf_dst->packet_type = buf_src.packet_type; +buf_dst->tx_offload = buf_src.tx_offload; +} +#else +#define dp_packet_copy_mbuf_flags(arg1, arg2) +#endif + /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of * memory starting at 'base'. 'base' should be the first byte of a region * obtained from malloc(). It will be freed (with free()) if 'b' is resized or @@ -177,15 +193,12 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) offsetof(struct dp_packet, l2_pad_size)); #ifdef DPDK_NETDEV -new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; -#else -new_buffer->rss_hash_valid = buffer->rss_hash_valid; -#endif - +dp_packet_copy_mbuf_flags(new_buffer, buffer); if (dp_packet_rss_valid(new_buffer)) { -#ifdef DPDK_NETDEV new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; #else +new_buffer->rss_hash_valid = buffer->rss_hash_valid; +if (dp_packet_rss_valid(new_buffer)) { new_buffer->rss_hash = buffer->rss_hash; #endif } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 93b0aaf..4607699 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -119,6 +119,9 @@ void dp_packet_init_dpdk(struct dp_packet *, size_t allocated); void dp_packet_init(struct dp_packet *, size_t); void dp_packet_uninit(struct dp_packet *); +void dp_packet_copy_mbuf_flags(struct dp_packet *dst, + const struct dp_packet *src); + struct dp_packet *dp_packet_new(size_t); struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom); struct dp_packet *dp_packet_clone(const struct dp_packet *); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7008492..c9de742 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2149,6 +2149,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); +dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); txcnt++; } -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 3/8] dp-packet: Add support for multi-seg mbufs
From: Mark KavanaghSome functions in dp-packet assume that the data held by a dp_packet is contiguous, and perform operations such as pointer arithmetic under that assumption. However, with the introduction of multi-segment mbufs, where data is non-contiguous, such assumptions are no longer possible. Thus, dp_packet_put_init(), dp_packet_shift(), dp_packet_tail(), dp_packet_end() and dp_packet_at() were modified to take multi-segment mbufs into account. Both dp_packet_put_uninit() and dp_packet_shift() are, in their current implementation, operating on the data buffer of a dp_packet as if it were contiguous, which in the case of multi-segment mbufs means they operate on the first mbuf in the chain. However, in the case of dp_packet_put_uninit(), for example, it is the data length of the last mbuf in the mbuf chain that should be adjusted. Both functions have thus been modified to support multi-segment mbufs. Finally, dp_packet_tail(), dp_packet_end() and dp_packet_at() were also modified to operate differently when dealing with multi-segment mbufs, and now iterate over the non-contiguous data buffers for their calculations. Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 44 - lib/dp-packet.h | 142 ++ lib/netdev-dpdk.c | 8 +-- 3 files changed, 147 insertions(+), 47 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 443c225..fd9fad0 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -298,10 +298,33 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t size) /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes. * For example, a 'delta' of 1 would cause each byte of data to move one byte * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each - * byte to move one byte backward (from 'p' to 'p-1'). */ + * byte to move one byte backward (from 'p' to 'p-1'). + * Note for DPBUF_DPDK(XXX): The shift can only move within a size of RTE_ + * PKTMBUF_HEADROOM, to either left or right, which is usually defined as 128 + * bytes. + */ void dp_packet_shift(struct dp_packet *b, int delta) { +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { +ovs_assert(delta > 0 ? delta <= dp_packet_headroom(b) + : delta < 0 ? -delta <= dp_packet_headroom(b) + : true); + +if (delta != 0) { +struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf); + +if (delta > 0) { +rte_pktmbuf_prepend(mbuf, delta); +} else { +rte_pktmbuf_prepend(mbuf, delta); +} +} + +return; +} +#endif ovs_assert(delta > 0 ? delta <= dp_packet_tailroom(b) : delta < 0 ? -delta <= dp_packet_headroom(b) : true); @@ -315,14 +338,31 @@ dp_packet_shift(struct dp_packet *b, int delta) /* Appends 'size' bytes of data to the tail end of 'b', reallocating and * copying its data if necessary. Returns a pointer to the first byte of the - * new data, which is left uninitialized. */ + * new data, which is left uninitialized. + * Note for DPBUF_DPDK(XXX): In this case there must be enough tailroom to put + * the data in, otherwise this will result in a call to ovs_abort(). */ void * dp_packet_put_uninit(struct dp_packet *b, size_t size) { void *p; dp_packet_prealloc_tailroom(b, size); p = dp_packet_tail(b); +#ifdef DPDK_NETDEV +if (b->source == DPBUF_DPDK) { +/* In the case of multi-segment mbufs, the data length of the last mbuf + * should be adjusted by 'size' bytes. The packet length of the entire + * mbuf chain (stored in the first mbuf of said chain) is adjusted in + * the normal execution path below. + */ +struct rte_mbuf *buf = &(b->mbuf); +buf = rte_pktmbuf_lastseg(buf); + +buf->data_len += size; +} +#endif + dp_packet_set_size(b, dp_packet_size(b) + size); + return p; } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 9bfb7b7..d6512cf 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -55,12 +55,12 @@ struct dp_packet { struct rte_mbuf mbuf; /* DPDK mbuf */ #else void *base_;/* First byte of allocated space. */ -uint16_t allocated_;/* Number of bytes allocated. */ uint16_t data_ofs; /* First byte actually in use. */ uint32_t size_; /* Number of bytes in use. */ uint32_t rss_hash; /* Packet hash. */ bool rss_hash_valid;/* Is the 'rss_hash' valid? */ #endif +uint16_t allocated_;/* Number of bytes allocated. */ enum dp_packet_source source; /* Source of memory allocated as 'base'. */ /* All the following elements of this struct are
[ovs-dev] [RFC v5 4/8] dp-packet: Fix data_len issue with multi-seg mbufs
When a dp_packet is from a DPDK source, and it contains multi-segment mbufs, the data_len is not equal to the packet size, pkt_len. Instead, the data_len of each mbuf in the chain should be considered while distributing the new (provided) size. To account for the above dp_packet_set_size() has been changed so that, in the multi-segment mbufs case, only the data_len on the last mbuf of the chain and the total size of the packet, pkt_len, are changed. The data_len on the intermediate mbufs preceeding the last mbuf is not changed by dp_packet_set_size(). Furthermore, in some cases dp_packet_set_size() may be used to set a smaller size than the current packet size, thus effectively trimming the end of the packet. In the multi-segment mbufs case this may lead to lingering mbufs that may need freeing. __dp_packet_set_data() now also updates an mbufs' data_len after setting the data offset. This is so that both fields are always in sync for each mbuf in a chain. Co-authored-by: Michael QiuCo-authored-by: Mark Kavanagh Co-authored-by: Przemyslaw Lal Co-authored-by: Marcin Ksiadz Co-authored-by: Yuanhan Liu Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Przemyslaw Lal Signed-off-by: Marcin Ksiadz Signed-off-by: Yuanhan Liu Signed-off-by: Tiago Lam --- lib/dp-packet.h | 38 +++--- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index d6512cf..93b0aaf 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -397,17 +397,31 @@ dp_packet_size(const struct dp_packet *b) static inline void dp_packet_set_size(struct dp_packet *b, uint32_t v) { -/* netdev-dpdk does not currently support segmentation; consequently, for - * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may - * be used interchangably. - * - * On the datapath, it is expected that the size of packets - * (and thus 'v') will always be <= UINT16_MAX; this means that there is no - * loss of accuracy in assigning 'v' to 'data_len'. - */ -b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ -b->mbuf.pkt_len = v; /* Total length of all segments linked to - * this segment. */ +if (b->source == DPBUF_DPDK) { +struct rte_mbuf *seg = >mbuf; +uint16_t pkt_len = v; +uint16_t seg_len; + +/* Trim 'v' length bytes from the end of the chained buffers, freeing + any buffers that may be left floating */ +while (seg) { +seg_len = MIN(pkt_len, seg->data_len); +seg->data_len = seg_len; + +pkt_len -= seg_len; +if (pkt_len <= 0) { +/* Free the rest of chained mbufs */ +rte_pktmbuf_free(seg->next); +seg->next = NULL; +} +seg = seg->next; +} +} else { +b->mbuf.data_len = v; +} + +/* Total length of all segments linked to this segment. */ +b->mbuf.pkt_len = v; } static inline uint16_t @@ -420,6 +434,8 @@ static inline void __packet_set_data(struct dp_packet *b, uint16_t v) { b->mbuf.data_off = v; +/* When dealing with DPDK mbufs, keep data_off and data_len in sync */ +b->mbuf.data_len = b->mbuf.buf_len - b->mbuf.data_off; } static inline void * -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 2/8] dp-packet: init specific mbuf fields to 0
From: Mark Kavanaghdp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to 0, on creation of a new dp_packet: - ol_flags - nb_segs - tx_offload - packet_type Adapted from an idea by Michael Qiu : https://patchwork.ozlabs.org/patch/777570/ Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 21c8ca5..9bfb7b7 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV -p->mbuf.ol_flags = 0; +struct rte_mbuf *mbuf = &(p->mbuf); +mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0; #endif } -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v5 1/8] netdev-dpdk: fix mbuf sizing
From: Mark KavanaghThere are numerous factors that must be considered when calculating the size of an mbuf: - the data portion of the mbuf must be sized in accordance With Rx buffer alignment (typically 1024B). So, for example, in order to successfully receive and capture a 1500B packet, mbufs with a data portion of size 2048B must be used. - in OvS, the elements that comprise an mbuf are: * the dp packet, which includes a struct rte mbuf (704B) * RTE_PKTMBUF_HEADROOM (128B) * packet data (aligned to 1k, as previously described) * RTE_PKTMBUF_TAILROOM (typically 0) Some PMDs require that the total mbuf size (i.e. the total sum of all of the above-listed components' lengths) is cache-aligned. To satisfy this requirement, it may be necessary to round up the total mbuf size with respect to cacheline size. In doing so, it's possible that the dp_packet's data portion is inadvertently increased in size, such that it no longer adheres to Rx buffer alignment. Consequently, the following property of the mbuf no longer holds true: mbuf.data_len == mbuf.buf_len - mbuf.data_off This creates a problem in the case of multi-segment mbufs, where that assumption is assumed to be true for all but the final segment in an mbuf chain. Resolve this issue by adjusting the size of the mbuf's private data portion, as opposed to the packet data portion when aligning mbuf size to cachelines. Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 46 ++ 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 3306b19..648a1de 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -82,12 +82,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + (2 * VLAN_HEADER_LEN)) #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\ - - ETHER_HDR_LEN - ETHER_CRC_LEN) -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM), \ - RTE_CACHE_LINE_SIZE) #define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 @@ -486,7 +480,7 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t +static uint16_t dpdk_buf_size(int mtu) { return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), @@ -577,7 +571,7 @@ dpdk_mp_do_not_free(struct rte_mempool *mp) OVS_REQUIRES(dpdk_mp_mutex) * - a new mempool was just created; * - a matching mempool already exists. */ static struct rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) { char mp_name[RTE_MEMPOOL_NAMESIZE]; const char *netdev_name = netdev_get_name(>up); @@ -585,6 +579,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) uint32_t n_mbufs; uint32_t hash = hash_string(netdev_name, 0); struct rte_mempool *mp = NULL; +uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; /* * XXX: rough estimation of number of mbufs required for this port: @@ -604,12 +599,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) * longer than RTE_MEMPOOL_NAMESIZE. */ int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs%08x%02d%05d%07u", - hash, socket_id, mtu, n_mbufs); + hash, socket_id, mbuf_pkt_data_len, n_mbufs); if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { VLOG_DBG("snprintf returned %d. " "Failed to generate a mempool name for \"%s\". " - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", - ret, netdev_name, hash, socket_id, mtu, n_mbufs); + "Hash:0x%x, socket_id: %d, pkt data room:%d, mbufs:%u.", + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len, + n_mbufs); break; } @@ -618,13 +614,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) netdev_name, n_mbufs, socket_id,
[ovs-dev] [RFC v5 0/8] Support multi-segment mbufs
Overview This patchset introduces support for multi-segment mbufs to OvS-DPDK. Multi-segment mbufs are typically used when the size of an mbuf is insufficient to contain the entirety of a packet's data. Instead, the data is split across numerous mbufs, each carrying a portion, or 'segment', of the packet data. mbufs are chained via their 'next' attribute (an mbuf pointer). Use Cases = i. Handling oversized (guest-originated) frames, which are marked for hardware accelration/offload (TSO, for example). Packets which originate from a non-DPDK source may be marked for offload; as such, they may be larger than the permitted ingress interface's MTU, and may be stored in an oversized dp-packet. In order to transmit such packets over a DPDK port, their contents must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in its current implementation, that function only copies data into a single mbuf; if the space available in the mbuf is exhausted, but not all packet data has been copied, then it is lost. Similarly, when cloning a DPDK mbuf, it must be considered whether that mbuf contains multiple segments. Both issues are resolved within this patchset. ii. Handling jumbo frames. While OvS already supports jumbo frames, it does so by increasing mbuf size, such that the entirety of a jumbo frame may be handled in a single mbuf. This is certainly the preferred, and most performant approach (and remains the default). However, it places high demands on system memory; multi-segment mbufs may be prefereable for systems which are memory-constrained. Enabling multi-segment mbufs Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true --- v5: - Rebased on master e5e22dc ("datapath-windows: Prevent ct-counters from getting redundantly incremented"); - Sugesh's comments have been addressed: - Changed dp_packet_set_data() and dp_packet_set_size() logic to make them independent of each other; - Dropped patch 3 now that dp_packet_set_data() and dp_packet_set_ size() are independent; - dp_packet_clone_with_headroom() now has split functions for handling DPDK sourced packets and non-DPDK packets; - Modified various functions in dp-packet.h to account for multi-seg mbufs - dp_packet_put_uninit(), dp_packet_tail(), dp_packet_tail() and dp_packet_at(); - Added support for shifting packet data in multi-seg mbufs, using dp_packet_shift(); - Fixed some minor inconsistencies. Note that some of the changes in v5 have been contributed by Mark Kavanaugh as well. v4: - restructure patchset - account for 128B ARM cacheline when sizing mbufs Mark Kavanagh (5): netdev-dpdk: fix mbuf sizing dp-packet: init specific mbuf fields to 0 dp-packet: Add support for multi-seg mbufs netdev-dpdk: copy large packet to multi-seg. mbufs netdev-dpdk: support multi-segment jumbo frames Michael Qiu (2): dp-packet: copy mbuf info for packet copy dp-packet: copy data from multi-seg. DPDK mbuf Tiago Lam (1): dp-packet: Fix data_len issue with multi-seg mbufs NEWS | 1 + lib/dp-packet.c | 118 lib/dp-packet.h | 189 --- lib/dpdk.c | 7 ++ lib/netdev-dpdk.c| 183 +++-- lib/netdev-dpdk.h| 1 + vswitchd/vswitch.xml | 20 ++ 7 files changed, 415 insertions(+), 104 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] odp-util: Remove unnecessary TOS ECN bits rewrite for tunnels
On Tue, May 01, 2018 at 12:36:06PM +, Jianbo Liu wrote: > For tunnels, TOS ECN bits are never wildcard for the reason that they > are always inherited. OVS will create a rewrite action if we add rule > to modify other IP headers. But it also adds an extra ECN rewrite for > the action because of this ECN un-wildcarding. > > It seems no error because the ECN bits to be changed are same in this > case. But as rule can't be offloaded to hardware, the unnecssary ECN > rewrite should be removed. > > Signed-off-by: Jianbo Liu> Reviewed-by: Paul Blakey > Reviewed-by: Roi Dayan Thanks, applied to mater, branch-2.9 and branch.2.8. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Great Android Application to Manage and Track your Field Executive's
Hi, I hope you are doing well. I am delighted to share with you "THE SECRET BEHIND SUCCESSFUL SALES TEAM MANAGEMENT". I found this is beneficial to you & your organization. Check: 5 Advantages of Smart Work-flow: 1. Automate work-force. 2. Increases transparency. 3. Instant communication. 4. Improves business efficiency. 5. Saves time & cost. I think it's something that your organization might see immediate value in. Looking forward to your response Regards Ricky Travelize Address - #No.5, First Floor, Race Course Road, Bengaluru-560009, Karnataka, India Mobile: 7406878981 If you are not interested please reply in subject line as UN-SUBSCRIBE ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] odp-util: Remove unnecessary TOS ECN bits rewrite for tunnels
For tunnels, TOS ECN bits are never wildcard for the reason that they are always inherited. OVS will create a rewrite action if we add rule to modify other IP headers. But it also adds an extra ECN rewrite for the action because of this ECN un-wildcarding. It seems no error because the ECN bits to be changed are same in this case. But as rule can't be offloaded to hardware, the unnecssary ECN rewrite should be removed. Signed-off-by: Jianbo LiuReviewed-by: Paul Blakey Reviewed-by: Roi Dayan --- lib/odp-util.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/lib/odp-util.c b/lib/odp-util.c index 6db241a..95c584b 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -6962,6 +6962,11 @@ commit_set_ipv4_action(const struct flow *flow, struct flow *base_flow, mask.ipv4_proto = 0;/* Not writeable. */ mask.ipv4_frag = 0; /* Not writable. */ +if (flow_tnl_dst_is_set(_flow->tunnel) && +((base_flow->nw_tos ^ flow->nw_tos) & IP_ECN_MASK) == 0) { +mask.ipv4_tos &= ~IP_ECN_MASK; +} + if (commit(OVS_KEY_ATTR_IPV4, use_masked, , , , sizeof key, odp_actions)) { put_ipv4_key(, base_flow, false); @@ -7012,6 +7017,11 @@ commit_set_ipv6_action(const struct flow *flow, struct flow *base_flow, mask.ipv6_proto = 0;/* Not writeable. */ mask.ipv6_frag = 0; /* Not writable. */ +if (flow_tnl_dst_is_set(_flow->tunnel) && +((base_flow->nw_tos ^ flow->nw_tos) & IP_ECN_MASK) == 0) { +mask.ipv6_tclass &= ~IP_ECN_MASK; +} + if (commit(OVS_KEY_ATTR_IPV6, use_masked, , , , sizeof key, odp_actions)) { put_ipv6_key(, base_flow, false); -- 2.9.5 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/1] netdev-dpdk: don't enable scatter for jumbo RX support for nfp
> On 04/27/2018 05:40 PM, Pablo Cascón wrote: > > Currently to RX jumbo packets fails for NICs not supporting scatter. > > Scatter is not strictly needed for jumbo RX support. This change fixes > > the issue by not enabling scatter only for the PMD/NIC known not to > > need it to support jumbo RX. > > > > Acked-by: Kevin Traynor> Thanks all, I'll apply this to DPDK_MERGE and backport to the previous releases, it will be part of this week's pull request. Thanks Ian > > Note: this change is temporary and not needed for later releases > > OVS/DPDK > > > > Reported-by: Louis Peens > > Signed-off-by: Pablo Cascón > > Reviewed-by: Simon Horman > > --- > > lib/netdev-dpdk.c | 14 +++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > ee39cbe..fdc8f66 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -694,11 +694,19 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, > int n_rxq, int n_txq) > > int diag = 0; > > int i; > > struct rte_eth_conf conf = port_conf; > > +struct rte_eth_dev_info info; > > > > -/* For some NICs (e.g. Niantic), scatter_rx mode needs to be > explicitly > > - * enabled. */ > > +/* As of DPDK 17.11.1 a few PMDs require to explicitly enable > > + * scatter to support jumbo RX. Checking the offload capabilities > > + * is not an option as PMDs are not required yet to report > > + * them. The only reliable info is the driver name and knowledge > > + * (testing or code review). Listing all such PMDs feels harder > > + * than highlighting the one known not to need scatter */ > > if (dev->mtu > ETHER_MTU) { > > -conf.rxmode.enable_scatter = 1; > > +rte_eth_dev_info_get(dev->port_id, ); > > +if (strncmp(info.driver_name, "net_nfp", 6)) { > > +conf.rxmode.enable_scatter = 1; > > +} > > } > > > > conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev