Re: [ovs-dev] [PATCH v2] handlers: Fix handlers mapping

Ilya Maximets Tue, 05 Apr 2022 15:32:26 -0700

On 4/4/22 14:36, Michael Santana wrote:
> The handler and CPU mapping in upcalls are incorrect, and this is
> specially noticeable systems with cpu isolation enabled.
> 
> Say we have a 12 core system where only every even number CPU is enabled
> C0, C2, C4, C6, C8, C10
> 
> This means we will create an array of size 6 that will be sent to
> kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]
> 
> The problem is when the kernel does an upcall it checks the socket array
> via the index of the CPU, effectively adding additional load on some
> CPUs while leaving no work on other CPUs.
> 
> e.g.
> 
> C0  indexes to S0
> C2  indexes to S2 (should be S1)
> C4  indexes to S4 (should be S2)
> 
> Modulo of 6 (size of socket array) is applied, so we wrap back to S0
> C6  indexes to S0 (should be S3)
> C8  indexes to S2 (should be S4)
> C10 indexes to S4 (should be S5)
> 
> Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
> get no work assigned to them
> 
> This leads to the kernel to throw the following message:
> "openvswitch: cpu_id mismatch with handler threads"
> 
> To fix this we send the kernel a corrected array of sockets the size
> of all CPUs in the system. In the above example we would create a
> corrected array as follows:
> [S0, S1, S1, S2, S2, S3, S3, S4, S4, S5, S5, S0]
> 
> This guarantees that regardless of which CPU a packet comes in the kernel
> will correctly map it to the correct socket


Hey, Michael.  Thanks for working on this issue!

I agree that the array is too short and ovs-vswitchd has to supply
a full array with values for each physical core.
This can be even a separate patch with just filling in the array
for each physical core in a round-robin manner.

However, I'm not sure it is possible to predict what the good
distribution of sockets among cores should look like.

Taking your current implementation as an example.  I'm assuming
that suggested distribution (socket duplication for neighboring
CPU cores) can be good for a system with 2 NUMA nodes and cores
enumerated as odd - NUMA0, even - NUMA1.  But the enumeration
scheme of CPU cores is not set in stone, most multi-NUMA systems
has a BIOS configuration knob to change the enumeration method.
One of the most popular ones is: lower cores - NUMA0, higher cores
- NUMA1.  In this case, since interrupts from a NIC are likely
arriving on the cores from the same NUMA, sockets S1-S3 will
carry all the traffic, while sockets S4-S5 will remain unused.

Another point is that we don't really know how many irq vectors
are registered for NICs and what is the irq affinity for them.
IOW, we don't know on which cores actual interrupts will be
handled and on which sockets they will end up.  IRQ affinity
and number of queues are also dynamic and can be changed over
time.

So, AFAICT, regardless of the positioning of the sockets in the
array, there will be very good and very bad cases for it with
more or less same probability.

It might be that the only option to guarantee more or less even
load distribution is to always create N handlers, where the N
is the number of actual physical cores.  And let the scheduler
and in-kernel IRQ affinity to deal with load balancing.
Yes, we're adding an extra overhead in case of limited CPU
affinity for the ovs-vswitchd, but OTOH if we have high volume
of traffic received on big number of CPU cores, while OVS is
limited to a single core, that sounds like a misconfiguration.
Some performance testing is required, for sure.

Thoughts?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] handlers: Fix handlers mapping

Reply via email to