Hi All,


I tested below cases, and get some performance data. The data shows there is 
little impact for cross NUMA communication, which is different from my 
expectation. (Previously I mentioned that cross NUMA would add 60% cycles, but 
I can NOT reproduce it any more).



@Jan,

You mentioned cross NUMA communication would cost lots more cycles. Can you 
share your data? I am not sure whether I made some mistake or not.



@All,

Welcome your data if you have data for similar cases. Thanks.



Case1: VM0->PMD0->NIC0

Case2:VM1->PMD1->NIC0

Case3:VM1->PMD0->NIC0

Case4:NIC0->PMD0->VM0

Case5:NIC0->PMD1->VM1

Case6:NIC0->PMD0->VM1



         VM Tx Mpps       Host Tx Mpps  avg cycles per packet       avg 
processing cycles per packet

Case1       1.4           1.4               512                      415

Case2       1.3           1.3               537                      436

Case3       1.35        1.35              514                      390



      VM Rx Mpps    Host Rx Mpps  avg cycles per packet       avg processing 
cycles per packet

Case4       1.3       1.3               549                      533

Case5       1.3       1.3               559                      540

Case6       1.28     1.28             568                      551



Br,

Wang Zhike



-----Original Message-----
From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
Sent: Wednesday, September 06, 2017 9:33 PM
To: O Mahony, Billy; 王志克; Darrell Ball; 
ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>; 
ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port



Hi Billy,



> You are going to have to take the hit crossing the NUMA boundary at some 
> point if your NIC and VM are on different NUMAs.

>

> So are you saying that it is more expensive to cross the NUMA boundary from 
> the pmd to the VM that to cross it from the NIC to the

> PMD?



Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth is 
typically not a bottleneck.) The PMD only performs local memory access.



On the other hand, if the PMD crosses the QPI when copying packets into a 
remote VM, there is a huge latency penalty involved, consuming lots of PMD 
cycles that cannot be spent on processing packets. We at Ericsson have observed 
exactly this behavior.



This latency penalty becomes even worse when the LLC cache hit rate is degraded 
due to LLC cache contention with real VNFs and/or unfavorable packet buffer 
re-use patterns as exhibited by real VNFs compared to typical synthetic 
benchmark apps like DPDK testpmd.



>

> If so then in that case you'd like to have two (for example) PMDs polling 2 
> queues on the same NIC. With the PMDs on each of the

> NUMA nodes forwarding to the VMs local to that NUMA?

>

> Of course your NIC would then also need to be able know which VM (or at least 
> which NUMA the VM is on) in order to send the frame

> to the correct rxq.



That would indeed be optimal but hard to realize in the general case (e.g. with 
VXLAN encapsulation) as the actual destination is only known after tunnel pop. 
Here perhaps some probabilistic steering of RSS hash values based on measured 
distribution of final destinations might help in the future.



But even without that in place, we need PMDs on both NUMAs anyhow (for 
NUMA-aware polling of vhostuser ports), so why not use them to also poll remote 
eth ports. We can achieve better average performance with fewer PMDs than with 
the current limitation to NUMA-local polling.



BR, Jan


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to