On 9/22/21 08:11, Hu, Jiayu wrote: > Hi Ilya, > >> -----Original Message----- >> From: Ilya Maximets <[email protected]> >> Sent: Tuesday, September 21, 2021 5:36 AM >> To: Hu, Jiayu <[email protected]>; Ilya Maximets <[email protected]>; >> Pai G, Sunil <[email protected]>; [email protected] >> Cc: Richardson, Bruce <[email protected]>; Mcnamara, John >> <[email protected]>; [email protected] >> Subject: Re: [ovs-dev] [PATCH RFC dpdk-latest V2 0/1] Enable vhost async >> API's in OvS. >> >> On 9/16/21 17:12, Hu, Jiayu wrote: >>> Hi Ilya, >>> >>>> -----Original Message----- >>>> From: Ilya Maximets <[email protected]> >>>> Sent: Thursday, September 16, 2021 1:27 AM >>>> To: Pai G, Sunil <[email protected]>; [email protected] >>>> Cc: Richardson, Bruce <[email protected]>; Mcnamara, John >>>> <[email protected]>; [email protected]; Hu, Jiayu >>>> <[email protected]>; [email protected] >>>> Subject: Re: [ovs-dev] [PATCH RFC dpdk-latest V2 0/1] Enable vhost >>>> async API's in OvS. >>>> >>>> On 9/7/21 14:00, Sunil Pai G wrote: >>>>> This series brings in the new asynchronous vHost API's in DPDK to OVS. >>>>> With the asynchronous framework, vHost-user can offload the memory >>>>> copy operation to DPDK DMA-dev based enabled devices like IntelĀ® >>>>> QuickData Technology without blocking the CPU. >>>> >>>> Copying here what I accidentally sent in reply to v1. >>>> --- >>>> >>>> As said in reply to the 'deferral of work' patch-set, OVS is >>>> synchronous and it is fine, because network devices are asynchronous by >> their nature. >>>> OVS is not blocked by memory copies, because these are handled by DMA >>>> configured and handled by device drivers. This patch adds DMA >>>> handling to >>> >>> As you said, network devices are asynchronous by nature and OVS is >> unaware of it. >>> For vhost, do you suggest that OVS shouldn't be aware of if vhost is >> asynchronous? >>> In other words, OVS shouldn't change code to asynchronous style. Is that >> correct? >> >> Yes. That's correct. > > The way that OVS uses vhost is more like implementing a VirtIO backend driver > in OVS. It makes vhost port not the same as physical net devices. For a > driver, it > may be fine to handle async logics, like clearing in-flight packets by DMA, > IMO. > The reason why no asynchronous in OVS vhost is not clear to me.
OVS uses vhost like it does only because DPDK doesn't provide a good abstraction on top of it. Previously, vhost_pmd didn't exist or didn't provide essential functionality that OVS needs, so we had to implement the port directly on top of the vhost library. Current vhost_pmd still doesn't provide some features, IIRC, and migration to it doesn't provide any benefits in terms of lines of code required to set it up, as we will still need to treat it largely different from other port types. These are reasons to not migrate to vhost_pmd. At the same, vhost library provides enough abstraction to not perceive it as a backend driver. Introduction of all these DMA handling completely breaks the illusion turning dpdkvhostuser* netdevs in OVS into actual drivers, and that is a wrong direction. IMO, OVS needs a better abstraction from low-level device management, and bringing device drivers into it is certainly an opposite thing. In the end, low-level device management is not a function of a network switch. > > From my understanding, OVS mainly handles three things when integrates with > asynchronous vhost. The first is DMA related operations (e.g., submit copy), > the > second is DMA assignment (current implementation is assigning DMA engines to > each pmd thread), and the last is to handle async packets, like clearing > inflight > packets when vring is disabled. I am wondering which one blocks OVS to > leverage > asynchronous vhost? Or none of the three? Not sure what you mean here, but I hope, I answered the question above and a bit below. --- As I will be out for a next 2 weeks, I was asked to share some ideas on how everything can be hidden inside the vhost library, so here it is (I didn't look closely at any DMA related parts nor reviewed the current implementation, so this is based on pure ideas): Since it's a poll-mode interface, OVS will periodically call the rte_vhost_dequeue_burst() in any case and that might be the trigger for anything needed (we may even postpone marking the queue as disabled until the next call to dequeue_burt(), so vhost library will be able to clean up in-flight DMA stuff on that last call). HW drivers are typically doing most of the housekeeping on Rx, so vhost library can do the same. I don't see any difference between calling Tx path one more time and doing work while calling Rx path from the latency standpoint, while latter doesn't require re-working OVS architecture. In short, rte_vhost_enqueue_burst() may only be responsible for queuing requests to the DMA engine. rte_vhost_dequeue_burst() will handle receive of the packets and also will complete outstanding transmissions. rte_vhost_enqueue_burst: - Get some descriptors for packets that needs to be sent to VM. - Create and submit tasks to DMA engine for these descriptors. rte_vhost_dequeue_burst: - Check completed DMA tasks created by rte_vhost_enqueue_burst. - Add completed descriptors to the virtqueue. - Check virtqueue for incoming descriptors from VM. - Create and submit tasks to DMA engine for these descriptors. - Check completed DMA tasks created on a previous call to rte_vhost_dequeue_burst. - Return mbufs for which DMA is completed as received packets. Moving completed descriptors from DMA completion queue to the virtqueue should be relatively cheap, so it should not be a problem to do inside the rte_vhost_dequeue_burst. (As Maxime pointed out, this may be not very cache-efficient, but that should not be a very big problem and needs testing in any case. If performance impact is not dramatic, that should be fine, taking into account significant improvement of the feature adoption simplicity.) And it should be possible to work on a DMA queue locklessly by using some kind of a ring buffer. (No idea how it's currently implemented, but it should be possible.) virtqueue will not need locking since only one thread will access it. DPDK knows core ids of running threads and that information can be used to assign different DMA channels to different threads. I guess, better control also can be achieved with a new rte_thread_register API. One remaining problem is when to increase 'tx_packets' counter. Solution for this is to count them inside the vhost library. IMO, vhost library needs statistics API for a long time already. Best regards, Ilya Maximets. > > Thanks, > Jiayu > >>> >>> Thanks, >>> Jiayu >>> >>>> vhost, making it essentially a physical device at some extent, but >>>> for some reason driver for that is implemented inside OVS. High >>>> level application should not care about memory copies inside the >>>> physical device and DMA configuration, but the code in this patch >>>> looks very much as parts of a specific device driver. >>>> >>>> Implementation of this feature belongs to vhost library, which is a >>>> driver for this (now) physical device. This way it can be consumed >>>> by OVS or any other DPDK application without major code changes. >>>> >>>> Best regards, Ilya Maximets. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
