Hi Kevin,

> > We have done extensive benchmarking and found that we get better overall
> PMD load balance and resulting OVS performance when we do not statically
> pin any rx queues and instead let the auto-load-balancing find the optimal
> distribution of phy rx queues over both NUMA nodes to balance an asymmetric
> load of vhu rx queues (polled only on the local NUMA node).
> >
> > Cross-NUMA polling of vhu rx queues comes with a very high latency cost due
> to cross-NUMA access to volatile virtio ring pointers in every iteration (not 
> only
> when actually copying packets). Cross-NUMA polling of phy rx queues doesn't
> have a similar issue.
> >
> 
> I agree that for vhost rxq polling, it always causes a performance penalty 
> when
> there is cross-numa polling.
> 
> For polling phy rxq, when phy and vhost are in different numas, I don't see 
> any
> additional penalty for cross-numa polling the phy rxq.
> 
> For the case where phy and vhost are both in the same numa, if I change to 
> poll
> the phy rxq cross-numa, then I see about a >20% tput drop for traffic from 
> phy -
> > vhost. Are you seeing that too?

Yes, but the performance drop is mostly due to the extra cost of copying the 
packets across the UPI bus to the virtio buffers on the other NUMA, not because 
of polling the phy rxq on the other NUMA.

> 
> Also, the fact that a different numa can poll the phy rxq after every 
> rebalance
> means that the ability of the auto-load-balancer to estimate and trigger a
> rebalance is impacted.

Agree, there is some inaccuracy in the estimation of the load a phy rx queue 
creates when it is moved to another NUMA node. So far we have not seen that as 
a practical problem.

> 
> It seems like simple pinning some phy rxqs cross-numa would avoid all the
> issues above and give most of the benefit of cross-numa polling for phy rxqs.

That is what we have done in the past (far a lack of alternatives). But any 
static pinning reduces the ability of the auto-load balancer to do its job. 
Consider the following scenarios:

1. The phy ingress traffic is not evenly distributed by RSS due to lack of 
entropy (Examples for this are IP-IP encapsulated traffic, e.g. Calico, or 
MPLSoGRE encapsulated traffic).

2. VM traffic is very asymmetric, e.g. due to a large dual-NUMA VM whose vhu 
ports are all on NUMA 0.

In all such scenarios, static pinning of phy rxqs may lead to unnecessarily 
uneven PMD load and loss of overall capacity.

> 
> With the pmd-rxq-assign=group and pmd-rxq-isolate=false options, OVS could
> still assign other rxqs to those cores which have with pinned phy rxqs and
> properly adjust the assignments based on the load from the pinned rxqs.

Yes, sometimes the vhu rxq load is distributed such that it can be use to 
balance the PMD, but not always. Sometimes the balance is just better when phy 
rxqs are not pinned.

> 
> New assignments or auto-load-balance would not change the numa polling
> those rxqs, so it it would have no impact to ALB or ability to assign based on
> load.

In our practical experience the new "group" algorithm for load-based rxq 
distribution is able to balance the PMD load best when none of the rxqs are 
pinned and cross-NUMA polling of phy rxqs is enabled. So the effect of the 
prediction error when doing auto-lb dry-runs cannot be significant.

In our experience we consistently get the best PMD balance and OVS throughput 
when we give the auto-lb free hands (no cross-NUMA polling of vhu rxqs, 
through).

BR, Jan
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to