On 2/3/22 11:48, Pai G, Sunil wrote:
> Hi Maxime, 
> 
>>> This version of the patch seems to have negative impact on performance
>> for burst traffic profile[1].
>>> Benefits seen with the previous version (v2) was up to ~1.6x for 1568 byte
>> packets compared to ~1.2x seen with the current design (v3) as measured on
>> new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.
>>> The cause of the drop seems to be because of the excessive vhost txq
>> contention across the PMD threads.
>>
>> So it means the Tx/Rx queue pairs aren't consumed by the same PMD
>> thread. can you confirm?
> 
> Yes, the completion polls for a given txq happens on a single PMD thread(on 
> the same thread where its corresponding rxq is being polled) but other 
> threads can submit(enqueue) packets on the same txq,  which leads to 
> contention.

Why this process can't be lockless?
If we have to lock the device, maybe we can do both submission
and completion from the thread that polls corresponding Rx queue?
Tx threads may enqueue mbufs to some lockless ring inside the
rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit
jobs to dma device and check completions.  No locks required.

> 
>>
>>> [1]:
>>> https://builders.intel.com/docs/networkbuilders/open-vswitch-optimized
>>> -deployment-benchmark-technology-guide.pdf
>>> [2]:
>>> https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
>>>
>>> Thanks and regards
>>> Sunil
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to