On 1/16/23 14:41, David Marchand wrote: > On Fri, Jan 13, 2023 at 4:57 PM Ilya Maximets <[email protected]> wrote: >>>>> @@ -5766,6 +5804,7 @@ static const struct netdev_class dpdk_vhost_class = >>>>> { >>>>> static const struct netdev_class dpdk_vhost_client_class = { >>>>> .type = "dpdkvhostuserclient", >>>>> NETDEV_DPDK_CLASS_COMMON, >>>>> + .run = netdev_dpdk_vhost_run, >>>> >>>> This part is also needed for the 'server' type. >>>> >>>> However, there is no guarantee that run() will be called in any reasonable >>>> time, because we don't have a wait() method that would schedule a wake up >>>> for the main thread. The main thread wakes up for other reasons like >>>> stats update, but that happens not that frequently and we can't rely on >>>> that >>>> anyway. For example, default stats update interval is 5 seconds. >>>> >>>> The problem is that we can't really iterate over devices and take >>>> dev->mutex >>>> in the context of a callback. Meaning we can't really implement a good >>>> wait() method that would trigger some seqno change that would wake up the >>>> main thread. We could trigger a global connectivity seqno, I guess. Not >>>> sure how safe it is, since sequence numbers use internal mutex. >>>> >>>> OTOH, maybe we can employ dpdk-watchdog thread to perform vhost >>>> housekeeping? >>>> Currently it's doing a very similar thing for physical ports, i.e. updates >>>> the link state periodically. It's sleep interval is a bit large though, >>>> but maybe we could tweak that, or update phy link states with the same >>>> interval as it is now, but check the vhost event queue a bit more >>>> frequently. >>>> Maybe even have something like a backoff in dp_netdev_flow_offload_main(). >>> >>> Having a dedicated thread seems safer, for the same reason as Maxime >>> mentionned for OVS main thread: vhost events may be triggered by a >>> malicious guest and OVS would be blind to some phy link updates. >>> >>> >>> Another thought on my patch... is it safe to have an unbound queue wrt >>> to memory consumption? >>> Again, a malicious guest could make OVS consume a lot of memory (in my >>> tests, OVS complained that heap increased to 500M). > > *Note* heap was at ~400M before starting the test. So it was an > increase of around 100M. > > The reason was likely that .run() was not run often enough and the > queue was growing with pending updates. > Now that I went with the dedicated thread approach, all I see is an > increase of 1-2M, maximum. > > I think we can avoid going with your suggestion. > WDYT?
OK. Sounds OK then. Maybe add a rate-limited warning message if the queue exceeds 1000 entries or so? > > >> >> We're entering a gross over-engineering territory, but the solution >> might be following: >> >> RCU-list with nodes with following elements: >> - ifname >> - qid >> - atomic_bool enabled >> - atomic_bool seen >> - timestamp > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
