On 28.10.2019 11:14, Eelco Chaudron wrote:
On 25 Oct 2019, at 17:06, Ilya Maximets wrote:
On 25.10.2019 15:51, Eelco Chaudron wrote:
<SNIP>
+static
+void vhost_guest_notified(int vid)
+{
+ struct netdev_dpdk *dev;
+
+ ovs_mutex_lock(&dpdk_mutex);
+ LIST_FOR_EACH (dev, list_node, &dpdk_list) {
+ if (netdev_dpdk_get_vid(dev) == vid) {
+ ovs_mutex_lock(&dev->mutex);
+ rte_spinlock_lock(&dev->stats_lock);
+ dev->vhost_irqs++;
+ rte_spinlock_unlock(&dev->stats_lock);
+ ovs_mutex_unlock(&dev->mutex);
+ break;
+ }
+ }
+ ovs_mutex_unlock(&dpdk_mutex);
So, for every single eventfd_write() we're taking the global mutex,
traversing list of all the devices, taking one more mutex and finally
taking the spinlock just to increment a single counter.
I think, it's too much.
Yes you are right this might be a bit of overkill…
Wouldn't it significantly affect interrupt based virtio drivers in
terms of performance? How frequently 2 vhost-user ports will lock
each other on the global device mutex? How frequently 2 PMD threads
will lock each other on rx/tx to different vrings of the same vhost
port?
I used iperf3 and did not show any negative effects (maybe I should add more
than 4 physical queues, 1 had one VM queue), but also the results show large
deviations.
From my experience, I'd say that iperf via kernel virtio driver is not able
to saturate a PMD thread. They are likely relaxed and have time to wait
on the mutex. Also, You, probably, have only couple of ports, so the critical
section is not large enough to produce frequent interlocking.
I'm just pointing that to count number of eventfd syscalls we're probably
making 2 other syscalls and probably sleeping in them.
Yes I agree and send out a v2 using a simple coverage counter
For testing purposes, I'd suggest to add 10-20(100?) dummy pmd ports (e.g.
vhost without VMs) and assign them to different thread. Probably, more threads
in iperf for traffic generation, switching to small udp packets.
You could also replace lock() with trylock() for dpdk_mutex and count
contentions in a same way as it done for tx_lock in David's patch.
The try lock would have been a good thing to see if I hit this lock contention.
Was already testing with small UDP packets, but used iPerf as I needed a kernel
like tool not a poll mode one based on DPDK.
I see that 'guest_notified' patch is already in dpdk master, but
wouldn't it be better to accumulate this statistics inside DPDK
and return it with some new API like rte_vhost_get_vring_xstats()
if required? I don't see any usecase for this callback beside
counting some stats.
I agree, however, the vhost library has no internal counters (only the PMD
implementation which we do not use), so hence they liked the callback.
One more possible issue is that this not-so-useful callback hijacked
the reserved space in the structure and we could suffer in the
future while adding new callbacks due to ABI breakage.
(In context of DPDK API/ABI stability discussions)
For the current patch I could only suggest to convert it to simple
coverage counter like it done in 'vhost_tx_contention' patch here:
https://patchwork.ozlabs.org/patch/1175849/
This is what I’ll do (and will send a patch soon). Although from a user
perspective a per vhost user would make more sense as it could easily indicate
which VM is configured wrongly…
Sure. I think that we need to re-imagine concept and implementation of
pmd-perf-stats in a "coverage"-like way, as I suggested previously while
discussing initial patches for pmd-perf, to count stats like vhost qlen,
tx contentions and this one. At least, this should be not so hard to do
on a per-PMD basis. per-netdev might be more tricky, but I believe that
it is possible.
This might be something that should be added. For per-netdev we could use
atomics assuming the counters are for exception use cases (or per CPU) and get
ride of any locking.
I have lockless netdev stats in my ToDo list for a few months already since
all these patches about packet drops started to appear.
However, for vhost we could also need some lockless (e.g. rcu protected)
structure to obtain netdev by the vid.
Key point for stats like tx_contentions or random syscall counters is that
I don't want them to be part of a regular (even custom) network statistics,
because they are not network statistics. So, this needs to be accounted
in a different way, e.g. in pmd-perf-stats or some similar netdev-perf-stats
implemented on the same re-worked pmd-perf infrastructure.
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev