David Marchand <[email protected]> writes: > Add a coverage counter to help diagnose contention on the vhost txqs. > This is seen as dropped packets on the physical ports for rates that > are usually handled fine by OVS. > Document how to further debug this contention with perf. > > Signed-off-by: David Marchand <[email protected]> > --- > Changelog since v1: > - added documentation as a bonus: not sure this is the right place, or if it > really makes sense to enter into such details. But I still find it useful. > Comments?
It's useful, and I think it makes sense here. > --- > Documentation/topics/dpdk/vhost-user.rst | 61 > ++++++++++++++++++++++++++++++++ > lib/netdev-dpdk.c | 8 ++++- > 2 files changed, 68 insertions(+), 1 deletion(-) > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index fab87bd..c7e605e 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -623,3 +623,64 @@ Because of this limitation, this feature is considered > 'experimental'. > Further information can be found in the > `DPDK documentation > <https://doc.dpdk.org/guides-18.11/prog_guide/vhost_lib.html>`__ > + > +Troubleshooting vhost-user tx contention > +---------------------------------------- > + > +Depending on the number of a virtio port Rx queues enabled by a guest and on > +the number of PMDs used on OVS side, OVS can end up with contention occuring > +on the lock protecting the vhost Tx queue. Maybe make the wording specific to a vhostuser port? I think someone might make a wrong conclusion if they use the virtio PMD as a dpdk port instead of using the vhostuser ports. Not sure *why* someone might do that, but it's a possibility and this counter won't tick for those cases. > +This problem can be hard to catch since it is noticeable as an increased cpu > +cost for handling the received packets and, usually, as drops in the > +statistics of the physical port receiving the packets. > + > +To identify such a situation, a coverage statistic is available:: > + > + $ ovs-appctl coverage/read-counter vhost_tx_contention > + 59530681 > + > +If you want to further debug this contention, perf can be used if your OVS > +daemon had been compiled with debug symbols. > + > +First, identify the point in the binary sources where the contention occurs:: > + > + $ perf probe -x $(which ovs-vswitchd) -L __netdev_dpdk_vhost_send \ > + |grep -B 3 -A 3 'COVERAGE_INC(vhost_tx_contention)' > + } > + > + 21 if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) > { > + 22 COVERAGE_INC(vhost_tx_contention); > + 23 rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > + } > + > +Then, place a probe at the line where the lock is taken. > +You can add additional context to catch which port and queue are concerned:: > + > + $ perf probe -x $(which ovs-vswitchd) \ > + 'vhost_tx_contention=__netdev_dpdk_vhost_send:23 netdev->name:string qid' > + > +Finally, gather data and generate a report:: > + > + $ perf record -e probe_ovs:vhost_tx_contention -aR sleep 10 > + [ perf record: Woken up 120 times to write data ] > + [ perf record: Captured and wrote 30.151 MB perf.data (356278 samples) ] > + > + $ perf report -F +pid --stdio > + # To display the perf.data header info, please use --header/--header-only > options. > + # > + # > + # Total Lost Samples: 0 > + # > + # Samples: 356K of event 'probe_ovs:vhost_tx_contention' > + # Event count (approx.): 356278 > + # > + # Overhead Pid:Command Trace output > + # ........ ..................... ............................ > + # > + 55.57% 83332:pmd-c01/id:33 (9e9775) name="vhost0" qid=0 > + 44.43% 83333:pmd-c15/id:34 (9e9775) name="vhost0" qid=0 > + > + > + # > + # (Tip: Treat branches as callchains: perf report --branch-history) > + # > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 7f709ff..3525870 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -41,6 +41,7 @@ > #include <rte_vhost.h> > > #include "cmap.h" > +#include "coverage.h" > #include "dirs.h" > #include "dp-packet.h" > #include "dpdk.h" > @@ -72,6 +73,8 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; > VLOG_DEFINE_THIS_MODULE(netdev_dpdk); > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > +COVERAGE_DEFINE(vhost_tx_contention); > + > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE > @@ -2353,7 +2356,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > qid, > goto out; > } > > - rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > + if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) { > + COVERAGE_INC(vhost_tx_contention); > + rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > + } > > cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); > /* Check has QoS has been configured for the netdev */ _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
