This patch adds infrastructure to the userspace datapath to defer or postpone work. At a high level, each PMD thread places work items into its own per thread work ring to be done later. The work ring is a FIFO queue of pointers to work items. Each work item has a "work_func()" function pointer allowing abstraction from what work is actually being done. More details about the infrastructure can be seen in the patch and its commit message.
The ability to defer work is necessary when considering asynchronous use-cases. The use-case this patch is targeted at is DMA offload of TX using VHOST ports. In this use-case, packets are passed to a copy engine rather than being copied in software. Once completed, the packets have to be freed and VHOST port statistics have to be updated in software. This completion work needs to be deferred. There are a number of requirements for an effective defer infrastructure. What are these and how are they accomplished: 1. Allow the thread which kicked off the DMA transfer to keep doing useful work, rather than waiting or polling for work to be completed. This is accomplished by deferring the completion work for DMA transfer rather than waiting for the DMA transfer to complete before moving on to process more packets. The completion work is added to the work ring to be done after some time, but more useful work can be done in the meantime. 2. Allow some time to pass between kicking off a DMA transfer for a VHOST port and checking for completion of the DMA transfer. This is accomplished by doing deferred work after processing all RXQs assigned to a PMD thread. 3. Upon checking for completion of the DMA transfer, allow re-deferral of work in the case where the DMA transfer has not completed. This is accomplished by adding checks in the "do_work()" function to defer the work again when DMA has not completed. This re-deferring of work helps with requirements 1 and 2. A ring buffer is used to queue the pointers to work items since its FIFO property means the DMA transfers which have been in progress the longest are checked first and have the highest chance of being completed. For this RFC, DPDK's rte_ring is used as the ring buffer implementation. This was the quickest way to get working code. A better solution will need to be found, since rte_ring should not be used in generic OVS datapath code. This TODO is mentioned in the code. Cian Ferriter (1): dpif-netdev: Add a per thread work ring lib/dpif-netdev-perf.c | 13 ++++- lib/dpif-netdev-perf.h | 7 +++ lib/dpif-netdev.c | 125 ++++++++++++++++++++++++++++++++++++++++- lib/netdev-dpdk.c | 22 +++++--- lib/netdev-provider.h | 15 ++++- lib/netdev.c | 3 +- 6 files changed, 172 insertions(+), 13 deletions(-) -- 2.17.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
