Re: [ovs-dev] [PATCH V3 0/4] netdev datapath flush offloaded flows
On 1/7/2021 9:10 PM, Sriharsha Basavapatna wrote: Hi Eli, On Mon, Dec 28, 2020 at 11:43 PM Eli Britstein wrote: Netdev datapath offloads are done in a separate thread, using messaging between the threads. With port removal there is a race between the offload thread removing the offloaded rules and the actual port removal, so some rules will not be removed. Can you provide details on this sequence, to get more clarity on how some rules will not be removed ? A deletion sequence from dpif point of view starts with dpif_port_del. It calls dpif->dpif_class->port_del (implemented by dpif_netdev_port_del) and netdev_ports_remove. flow_mark_flush is called (dpif_netdev_port_del->do_del_port->reconfigure_datapath->reload_affected_pmds). This function only posts deletion requests to the queue (queue_netdev_flow_del), to be later handled by the offload thread. When the offload thread wakes up to handle the request (dp_netdev_flow_offload_del->mark_to_flow_disassociate) it looks for the netdev object by netdev_ports_get in port_to_netdev map, racing the execution of netdev_ports_remove that removes that mapping. If the mapping is removed before the handling, the removal of the HW rule won't take place, leaking it and the related allocated memory. In OVS the offload objects are not freed (memory leak). Since dp_netdev_free_flow_offload() frees the offload object, not sure how the memory leak happens ? As eventually netdev_offload_dpdk_flow_del is not called, the objects in ufid_to_rte_flow that should have been freed by ufid_to_rte_flow_disassociate are leaked (as well as the rte_flows not destroyed). In HW the remining of the rules depend on PMD behavior. A few related questions: 1. In Patch-1, netdev_flow_flush() is called from do_del_port(). Doesn't it violate the layering rules, since port deletion shouldn't be directly involved with netdev/flow related operations (that would otherwise be processed as a part of datapath reconfig) ? netdev/flow operations are called from dpif-netdev layer, that includes do_del_port. We can call the offload functions under dp->port_mutex which is locked in that function. Which layering rules are violated? 2. Since the rte_flow is deleted first, doesn't it leave the dpif_netdev_flow in an inconsistent state, since the flow is still active from the pmd thread's perspective (megaflow_to_mark and mark_to_flow cmap entries are still valid, indicating that it is offloaded). That flow is a sequence not related to this patch set, that occurs normally. Currently there are two types of offloads (both are ingress): - Partial. Packets are handled by SW with or without offload. Packets are fully processed correctly by the SW either if they don't have mark or have mark that is not found by mark_to_flow_find. - Full. Packets will be handled correctly by the SW as they arrive with no mark. The mark<->megaflow mapping disassociation is handled correctly even if the rte_flow is not removed. However, indeed, this is a good point to take into consideration when implementing egress offloading. 3. Also, how does this work with partially offloaded flows (same megaflow mapped to multiple pmd threads) ? Also not related to this patch-set. The flow is ref-counted. Upon the first ref the rte_flow is created, and it is destroyed with the last unref (see mark_to_flow_disassociate). Thanks, -Harsha This patch-set resolves this issue using flush. v2-v1: - Fix for pending offload requests. v3-v2: - Rebase. Travis: v1: https://travis-ci.org/github/elibritstein/OVS/builds/747022942 v2: https://travis-ci.org/github/elibritstein/OVS/builds/748788786 v3: https://travis-ci.org/github/elibritstein/OVS/builds/751787939 GitHub Actions: v1: https://github.com/elibritstein/OVS/actions/runs/394296553 v2: https://github.com/elibritstein/OVS/actions/runs/413379295 v3: https://github.com/elibritstein/OVS/actions/runs/448541155 Eli Britstein (4): dpif-netdev: Flush offload rules upon port deletion netdev-offload-dpdk: Keep netdev in offload object netdev-offload-dpdk: Refactor disassociate and flow destroy netdev-offload-dpdk: Implement flow flush lib/dpif-netdev.c | 2 + lib/netdev-offload-dpdk.c | 85 +-- 2 files changed, 56 insertions(+), 31 deletions(-) -- 2.28.0.546.g385c171 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Documentation: Simplify the website main page.
On Thu, Jan 07, 2021 at 12:15:10PM -0800, Ben Pfaff wrote: > On Mon, Dec 28, 2020 at 07:02:55PM -0300, Flavio Leitner wrote: > > The initial website page is difficult to read because of > > the large amount of links from different parts of the whole > > documentation. Most of all those links come from their > > index page referenced in the section 'Contents' on the side. > > > > Another issue is that because the page is static, new links > > might not get included. > > > > This patch simplifies the main page by highlighting the project > > level documentation. The static part is reduced to the main > > level index pages. > > > > All the links are available by clicking on 'Full Table of > > Contents' at the end of Documentation section. > > > > Signed-off-by: Flavio Leitner > > Thanks for working on the documentation. I applied this to master. Thanks Ben! -- fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] netdev-dpdk: Refactor the DPDK transmit path.
This patch split out the common code between vhost and dpdk transmit paths to shared functions to simplify the code and fix an issue. The issue is that the packet coming from non-DPDK device and egressing on a DPDK device currently skips the hwol preparation. This also have the side effect of leaving only the dpdk transmit code under the txq lock. Signed-off-by: Flavio Leitner --- V2: - mentioned the tx lock change in the commit message. - fixed packet leak when copy fails. - moved pkt_cnt = cnt two lines up. I tested the following scenarios with iperf and iperf -R and noticed no performance regression: IPERF VM-Bridge 1.02x IPERF VM-VM 1.04x IPERF VM-ExtHost 1.00x IPERF VM-NS 1.01x IPERF VM-VLAN-Bridge 1.03x IPERF VM-VLAN-VM 1.03x IPERF VM-VLAN-ExtHost 1.01x IPERF VM-VLAN-NS 1.02x IPERFVM-V6-VM 1.03x IPERF VM-V6-ExtHost 1.00x IPERFVM-V6-NS 1.01x IPERF-R VM-Bridge 1.01x IPERF-R VM-VM 1.04x IPERF-R VM-ExtHost 1.10x IPERF-R VM-NS 1.01x IPERF-R VM-VLAN-Bridge 1.03x IPERF-R VM-VLAN-VM 1.02x IPERF-R VM-VLAN-ExtHost 1.08x IPERF-R VM-VLAN-NS 1.02x IPERF-RVM-V6-VM 1.00x IPERF-R VM-V6-ExtHost 1.11x IPERF-RVM-V6-NS 1.00x Now using trex, 64byte packet, PVP: Original: 3.6Mpps avg. packets per output batch: 32.00 idle cycles: 0 (0.00%) avg cycles per packet: 304.92 (8331383020/27323150) avg processing cycles per packet: 304.92 (8331383020/27323150) Patched: 3.6Mpps avg. packets per output batch: 32.00 idle cycles: 0 (0.00%) avg cycles per packet: 304.08 (21875784116/71941516) avg processing cycles per packet: 304.08 (21875784116/71941516) lib/netdev-dpdk.c | 335 +++--- 1 file changed, 139 insertions(+), 196 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2640a421a..a1437db4d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2585,90 +2585,6 @@ netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev, } } -static void -__netdev_dpdk_vhost_send(struct netdev *netdev, int qid, - struct dp_packet **pkts, int cnt) -{ -struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); -struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; -struct netdev_dpdk_sw_stats sw_stats_add; -unsigned int n_packets_to_free = cnt; -unsigned int total_packets = cnt; -int i, retries = 0; -int max_retries = VHOST_ENQ_RETRY_MIN; -int vid = netdev_dpdk_get_vid(dev); - -qid = dev->tx_q[qid % netdev->n_txq].map; - -if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < 0 - || !(dev->flags & NETDEV_UP))) { -rte_spinlock_lock(>stats_lock); -dev->stats.tx_dropped+= cnt; -rte_spinlock_unlock(>stats_lock); -goto out; -} - -if (OVS_UNLIKELY(!rte_spinlock_trylock(>tx_q[qid].tx_lock))) { -COVERAGE_INC(vhost_tx_contention); -rte_spinlock_lock(>tx_q[qid].tx_lock); -} - -sw_stats_add.tx_invalid_hwol_drops = cnt; -if (userspace_tso_enabled()) { -cnt = netdev_dpdk_prep_hwol_batch(dev, cur_pkts, cnt); -} - -sw_stats_add.tx_invalid_hwol_drops -= cnt; -sw_stats_add.tx_mtu_exceeded_drops = cnt; -cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); -sw_stats_add.tx_mtu_exceeded_drops -= cnt; - -/* Check has QoS has been configured for the netdev */ -sw_stats_add.tx_qos_drops = cnt; -cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true); -sw_stats_add.tx_qos_drops -= cnt; - -n_packets_to_free = cnt; - -do { -int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; -unsigned int tx_pkts; - -tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt); -if (OVS_LIKELY(tx_pkts)) { -/* Packets have been sent.*/ -cnt -= tx_pkts; -/* Prepare for possible retry.*/ -cur_pkts = _pkts[tx_pkts]; -if (OVS_UNLIKELY(cnt && !retries)) { -/* - * Read max retries as there are packets not sent - * and no retries have already occurred. - */ -atomic_read_relaxed(>vhost_tx_retries_max, _retries); -} -} else { -/* No packets sent - do not retry.*/ -break; -} -} while (cnt && (retries++ < max_retries)); - -rte_spinlock_unlock(>tx_q[qid].tx_lock); - -sw_stats_add.tx_failure_drops = cnt; -sw_stats_add.tx_retries = MIN(retries, max_retries); - -rte_spinlock_lock(>stats_lock); -netdev_dpdk_vhost_update_tx_counters(dev, pkts, total_packets, - _stats_add); -
[ovs-dev] [PATCH] tunnel:fix vlan bug for tunnel forwarding
From: hongzhi guo I found the basic vxlan topology cannot work when upgrade ovs 2.12 version. Through debugging, I found that when the native tunnel forward, the inner vlan was used for forwarding mistakely. Which resulted in packet loss on the tunnel_bearing port. Then I abstracted the scene where the problem occurred. Test: |---| | br1 o br1(tag=2) |p--| |br1-2(tag=2) | |ip = 10.10.10.22 |br2-1(tag=2) p3(tag=3)in namespace |p--||o--| | br2 o| br3 | |v--||-t-| vxlan tunnel_bearing(tag=3) (remote_ip=10.10.10.22) (10.10.10.21) Bridge br3 datapath_type: netdev Port p3 tag: 3 Interface p3 type: internal Port tunnel_bearing tag: 3 Interface tunnel_bearing type: internal Port br3 Interface br3 type: internal Bridge br2 datapath_type: netdev Port br2 Interface br2 type: internal Port br2-1 tag: 2 Interface br2-1 type: patch options: {peer=br1-2} Port vxlan Interface vxlan type: vxlan options: {remote_ip="10.10.10.22"} Bridge br1 datapath_type: netdev Port br1 tag: 2 Interface br1 type: internal Port br1-2 tag: 2 Interface br1-2 type: patch options: {peer=br2-1} Trace: ovs-appctl ofproto/trace br1 in_port=br1,\ dl_src=fa:16:3e:8c:eb:5b,dl_dst=fa:16:3e:a5:15:f3,\ ip,nw_src=20.20.20.104,nw_dst=20.20.20.101,\ nw_proto=6 -generate Trace-result: bridge("br1") - 0. priority 0 NORMAL -> no learned MAC for destination, flooding bridge("br2") - 0. priority 0 NORMAL -> no learned MAC for destination, flooding -> output to native tunnel -> tunneling to 10.10.10.22 via tunnel_bearing -> tunneling from 96:2d:94:57:6a:55 10.10.10.21\ to c6:bc:3e:69:83:cc 10.10.10.22 bridge("br3") - 0. priority 0 NORMAL dropping VLAN 2 tagged packet received on\ port tunnel_bearing configured as VLAN 3 access port >> disallowed VLAN VID for this input port, dropping CC: Sugesh Chandran Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.") Signed-off-by: hongzhi guo --- ofproto/ofproto-dpif-xlate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 7108c8a30..d03ccf91e 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3546,6 +3546,7 @@ propagate_tunnel_data_to_flow__(struct flow *dst_flow, } } dst_flow->nw_proto = nw_proto; +memset(dst_flow->vlans, 0, sizeof(dst_flow->vlans)); } /* -- 2.18.0.windows.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Refactor the DPDK transmit path.
On Wed, Jan 06, 2021 at 04:13:14PM +0100, David Marchand wrote: > On Thu, Dec 3, 2020 at 6:50 PM Flavio Leitner wrote: > > > > This patch split out the common code between vhost and dpdk transmit > > paths to shared functions to simplify the code and fix an issue. > > Nice cleanup. Great! > This also have the side effect of leaving only the dpdk transmit code > under the txq lock. > Maybe worth mentioning. I will add to the commit message. > Some comments below. Thanks for reviewing it. > > > > The issue is that the packet coming from non-DPDK device and egressing > > on a DPDK device currently skips the hwol preparation. > > > > Signed-off-by: Flavio Leitner > > --- > > lib/netdev-dpdk.c | 334 +++--- > > 1 file changed, 138 insertions(+), 196 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 75dffefb8..c74553e2e 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > [snip] > > > @@ -2795,76 +2711,69 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, > > struct dp_packet *pkt_orig) > > return pkt_dest; > > } > > > > -/* Tx function. Transmit packets indefinitely */ > > -static void > > -dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch > > *batch) > > +/* Replace packets in a 'batch' with their corresponding copies using > > + * DPDK memory. > > + * > > + * Returns the number of good packets in the batch. */ > > +static size_t > > +dpdk_copy_batch_to_mbuf(struct netdev *netdev, struct dp_packet_batch > > *batch) > > OVS_NO_THREAD_SAFETY_ANALYSIS > > { > > -const size_t batch_cnt = dp_packet_batch_size(batch); > > -#if !defined(__CHECKER__) && !defined(_WIN32) > > -const size_t PKT_ARRAY_SIZE = batch_cnt; > > -#else > > -/* Sparse or MSVC doesn't like variable length array. */ > > -enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST }; > > -#endif > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > -struct dp_packet *pkts[PKT_ARRAY_SIZE]; > > -struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats; > > -uint32_t cnt = batch_cnt; > > -uint32_t dropped = 0; > > -uint32_t tx_failure = 0; > > -uint32_t mtu_drops = 0; > > -uint32_t qos_drops = 0; > > - > > -if (dev->type != DPDK_DEV_VHOST) { > > -/* Check if QoS has been configured for this netdev. */ > > -cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets, > > - batch_cnt, false); > > -qos_drops = batch_cnt - cnt; > > -} > > - > > -uint32_t txcnt = 0; > > - > > -for (uint32_t i = 0; i < cnt; i++) { > > -struct dp_packet *packet = batch->packets[i]; > > -uint32_t size = dp_packet_size(packet); > > - > > -if (size > dev->max_packet_len > > -&& !(packet->mbuf.ol_flags & PKT_TX_TCP_SEG)) { > > -VLOG_WARN_RL(, "Too big size %u max_packet_len %d", size, > > - dev->max_packet_len); > > -mtu_drops++; > > -continue; > > -} > > +size_t i, size = dp_packet_batch_size(batch); > > +struct dp_packet *packet; > > > > -pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, > > packet); > > -if (OVS_UNLIKELY(!pkts[txcnt])) { > > -dropped = cnt - i; > > -break; > > -} > > +DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { > > +if (OVS_UNLIKELY(packet->source == DPBUF_DPDK)) { > > +dp_packet_batch_refill(batch, packet, i); > > +} else { > > +struct dp_packet *pktcopy; > > > > -txcnt++; > > +pktcopy = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, > > packet); > > +if (pktcopy) { > > +dp_packet_batch_refill(batch, pktcopy, i); > > +dp_packet_delete(packet); > > +} > > +} > > Won't it leak packet if mbuf allocation/copy fails? True. The dp_packet_delete() should happen regardless of pktcopy. > > > } > > > > -if (OVS_LIKELY(txcnt)) { > > -if (dev->type == DPDK_DEV_VHOST) { > > -__netdev_dpdk_vhost_send(netdev, qid, pkts, txcnt); > > -} else { > > -tx_failure += netdev_dpdk_eth_tx_burst(dev, qid, > > - (struct rte_mbuf > > **)pkts, > > - txcnt); > > -} > > +return dp_packet_batch_size(batch); > > +} > > + > > +static size_t > > +netdev_dpdk_common_send(struct netdev *netdev, struct dp_packet_batch > > *batch, > > +struct netdev_dpdk_sw_stats *stats) > > +{ > > +struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; > > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > +size_t cnt, pkt_cnt = dp_packet_batch_size(batch); > > + > > +memset(stats, 0, sizeof *stats); > > + > > +/* Copy dp-packets to
Re: [ovs-dev] Bug#974588: Bug#974588: openvswitch: DPDK 20.11 support and transition for bullseye
On Sat, 9 Jan 2021 at 15:50, Luca Boccassi wrote: > > On Thu, 7 Jan 2021 at 20:14, Luca Boccassi wrote: > > > > On Thu, 7 Jan 2021 at 19:53, Thomas Goirand wrote: > > > > > > On 1/7/21 4:54 PM, Christian Ehrhardt wrote: > > > > Hi, > > > > as an FYI Ubuntu moved to recent commit def6eb1ea and for us it seems > > > > to work fine so far. > > > > See the package at: > > > > https://launchpad.net/ubuntu/+source/openvswitch/2.15.0~git20210104.def6eb1ea-0ubuntu3 > > > > > > > > With that confirmed and the feature freeze coming soon. Would you mind > > > > doing a similar upload to Debian-experimental soon'ish? > > > > > > So, basically, you wish that I package the tip of master in > > > Experimental? I'll see if I find the time to do it. However, I very much > > > would prefer if upstream OVS could cut a (alpha / beta) tag. > > > > > > Cheers, > > > > > > Thomas Goirand (zigo) > > > > Yes, given time is running out we need to get the transition starting > > ASAP, unfortunately. The upstream OVS timeline for RCs and releases > > unfortunately doesn't quite align and would be too late. I asked the > > release team for an exception but I had no answer, so at this point it > > seems to me the safest course of action is to assume we're not getting > > one. > > > > There has been extensive testing of that commit id by Canonical, so it > > should be safe to use in unstable/testing for a month, until the > > proper release. > > > > Thank you! > > > > Kind regards, > > Luca Boccassi > > Hello Thomas, > > I've prepared an MR: > https://salsa.debian.org/openstack-team/third-party/openvswitch/-/merge_requests/6 > > The changes look quite simple and they are based on the Ubuntu > package. Exact same upstream snapshot is used, so that we know it's > been tested. I have build-tested it in a chroot with libdpdk-dev from > experimental. > > Please, can we get this sorted soon? Would you be OK with an NMU to > experimental? > > Thank you! > > Kind regards, > Luca Boccassi Christian, We've got tons of test failures with a merge of the same commit. Any idea? 586 tests are failing out of 2266... ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Bug#974588: Bug#974588: openvswitch: DPDK 20.11 support and transition for bullseye
On Thu, 7 Jan 2021 at 20:14, Luca Boccassi wrote: > > On Thu, 7 Jan 2021 at 19:53, Thomas Goirand wrote: > > > > On 1/7/21 4:54 PM, Christian Ehrhardt wrote: > > > Hi, > > > as an FYI Ubuntu moved to recent commit def6eb1ea and for us it seems > > > to work fine so far. > > > See the package at: > > > https://launchpad.net/ubuntu/+source/openvswitch/2.15.0~git20210104.def6eb1ea-0ubuntu3 > > > > > > With that confirmed and the feature freeze coming soon. Would you mind > > > doing a similar upload to Debian-experimental soon'ish? > > > > So, basically, you wish that I package the tip of master in > > Experimental? I'll see if I find the time to do it. However, I very much > > would prefer if upstream OVS could cut a (alpha / beta) tag. > > > > Cheers, > > > > Thomas Goirand (zigo) > > Yes, given time is running out we need to get the transition starting > ASAP, unfortunately. The upstream OVS timeline for RCs and releases > unfortunately doesn't quite align and would be too late. I asked the > release team for an exception but I had no answer, so at this point it > seems to me the safest course of action is to assume we're not getting > one. > > There has been extensive testing of that commit id by Canonical, so it > should be safe to use in unstable/testing for a month, until the > proper release. > > Thank you! > > Kind regards, > Luca Boccassi Hello Thomas, I've prepared an MR: https://salsa.debian.org/openstack-team/third-party/openvswitch/-/merge_requests/6 The changes look quite simple and they are based on the Ubuntu package. Exact same upstream snapshot is used, so that we know it's been tested. I have build-tested it in a chroot with libdpdk-dev from experimental. Please, can we get this sorted soon? Would you be OK with an NMU to experimental? Thank you! Kind regards, Luca Boccassi ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev