Ilya Maximets <[email protected]> writes:

> On 7/19/22 08:59, Michael Santana wrote:
>> The handler and CPU mapping in upcalls are incorrect, and this is
>> specially noticeable systems with cpu isolation enabled.
>> 
>> Say we have a 12 core system where only every even number CPU is enabled
>> C0, C2, C4, C6, C8, C10
>> 
>> This means we will create an array of size 6 that will be sent to
>> kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]
>> 
>> The problem is when the kernel does an upcall it checks the socket array
>> via the index of the CPU, effectively adding additional load on some
>> CPUs while leaving no work on other CPUs.
>> 
>> e.g.
>> 
>> C0  indexes to S0
>> C2  indexes to S2 (should be S1)
>> C4  indexes to S4 (should be S2)
>> 
>> Modulo of 6 (size of socket array) is applied, so we wrap back to S0
>> C6  indexes to S0 (should be S3)
>> C8  indexes to S2 (should be S4)
>> C10 indexes to S4 (should be S5)
>> 
>> Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
>> get no work assigned to them
>> 
>> This leads to the kernel to throw the following message:
>> "openvswitch: cpu_id mismatch with handler threads"
>> 
>> Instead we will send the kernel a corrected array of sockets the size
>> of all CPUs in the system. In the above example we would create a
>> corrected array in a round-robin(assuming prime bias) fashion as follows:
>> [S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4]
>> 
>> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
>> 
>> Co-authored-by: Aaron Conole <[email protected]>
>> signed-off-by: Aaron Conole <[email protected]>
>> Signed-off-by: Michael Santana <[email protected]>
>> ---
>>  lib/dpif-netlink.c | 18 ++++++++++++------
>>  1 file changed, 12 insertions(+), 6 deletions(-)
>> 
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index 21f149867..9efd6e6b5 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -803,11 +803,11 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, 
>> const uint32_t *upcall_pids,
>>      struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
>>      struct dpif_netlink_dp request, reply;
>>      struct ofpbuf *bufp;
>> -    int error;
>> -    int n_cores;
>>  
>> -    n_cores = count_cpu_cores();
>> -    ovs_assert(n_cores == n_upcall_pids);
>> +    uint32_t *corrected;
>> +    int error, i, n_cores;
>> +
>> +    n_cores = count_total_cores();
>>      VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores);
>
> I think my comment from v5 still stands here.  The fact that we have 5
> cores doesn't mean that the largest core id is 5.  Cores are not necessarily
> numbered without holes.  For example, on Power systems cores and numas are
> not sequential in most cases.
>
> I mean, we probably could accept this change as-is, because it is definitely
> an improvement from what we have now, but it is not a full solution.
>
> What I suggested last time is to ask ovs-numa module to give us the largest
> detected core id, get the MAX between that value and the number returned by
> count_total_cores(), and use that as an array size.
>
> Does that make sense?

I think it makes sense, and I think there is another change that we
should have here because as it stands the log message doesn't make much
sense any longer.

Log message used to indicate the number of CPUs we were running handlers
against.  This is useful information.  Simply spitting out the number of
total cores isn't as useful, so we probably want to change it to print
n_upcall_pids rather than n_cores.

>>  
>>      dpif_netlink_dp_init(&request);
>> @@ -817,7 +817,12 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const 
>> uint32_t *upcall_pids,
>>      request.user_features = dpif->user_features |
>>                              OVS_DP_F_DISPATCH_UPCALL_PER_CPU;
>>  
>> -    request.upcall_pids = upcall_pids;
>> +    corrected = xcalloc(n_cores, sizeof *corrected);
>> +
>> +    for (i = 0; i < n_cores; i++) {
>> +        corrected[i] = upcall_pids[i % n_upcall_pids];
>> +    }
>> +    request.upcall_pids = corrected;
>>      request.n_upcall_pids = n_cores;
>>  
>>      error = dpif_netlink_dp_transact(&request, &reply, &bufp);
>> @@ -825,9 +830,10 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const 
>> uint32_t *upcall_pids,
>>          dpif->user_features = reply.user_features;
>>          ofpbuf_delete(bufp);
>>          if (!dpif_netlink_upcall_per_cpu(dpif)) {
>> -            return -EOPNOTSUPP;
>> +            error = -EOPNOTSUPP;
>>          }
>>      }
>> +    free(corrected);
>>      return error;
>>  }
>>  

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to