On 8 Feb 2024, at 15:00, Ilya Maximets wrote:

> On 2/8/24 13:44, Eelco Chaudron wrote:
>>
>>
>> On 6 Feb 2024, at 16:01, Ilya Maximets wrote:
>>
>>> On 2/6/24 15:07, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 6 Feb 2024, at 14:46, Ilya Maximets wrote:
>>>>
>>>>> On 2/6/24 14:30, Eelco Chaudron wrote:
>>>>>> Previously, the ability to override the system default for the number
>>>>>> of handler threads was broken since to the introduction of the per-CPU
>>>>>> upcall handlers.
>>>>>
>>>>> It wasn't broken, it was intentionally disabled.  I don't think we should
>>>>> introduce ability to configure the number of handlers for per-CPU dispatch
>>>>> mode.  It only allows users to shoot themselves in a foot if they do not
>>>>> understand what they are doing.
>>>>
>>>> I think the design was not thought through enough, and resulted in a lot of
>>>> corner case issues. From the start, we should still have allowed tuning of
>>>> this option as it was before.
>>>
>>> The tuning is much more complicated and has very different consequences
>>> with per-cpu dispatch compared to per-vport.  So, the results of adjusting
>>> this option can be very different.
>>
>>>>> Many integration scripts in a wild are still blindly setting these options
>>>>> without even knowing that the underlying implementation changed 
>>>>> completely.
>>>>> And those setups may suffer if we suddenly decide to respect the 
>>>>> configuration
>>>>> that is ignored on modern kernels for all currently supported versions of 
>>>>> OVS.
>>>>
>>>> I feel like this is not OVS’s problem. If people do not know what they are 
>>>> doing,
>>>> they should not touch it… We have a nice Dutch saying for this "Wie zijn 
>>>> billen
>>>> brandt, moet op de blaren zitten” :)
>>>
>>> It is an OVS's problem for the same reason as above.  We changed the logic
>>> under the hood so much that users that expected a certain behavior in the
>>> past may have a very different behavior now.  We can't just change the
>>> meaning of the knob that easily.  See below.
>>
>> Ack, so I would suggest adding a new nob.
>>
>>>>
>>>> However, I do feel like we should have an option to configure this, as it 
>>>> makes
>>>> no sense on a 256-core system to create 256 handler threads.
>>>
>>> It actually makes perfect sense.  Since the traffic can appear on 256 
>>> cores, we
>>> should be able to handle it on each one of them in order to have balanced 
>>> load.
>>>
>>> With per-vport dispatch, on a system with 10 cores and 5 handler threads,
>>> all 5 handler threads will share the load, because each one of them listens
>>> on all the ports, hence each one of them can process packets whenever it
>>> has time to do so.
>>
>> Yes, but this had other problems :)
>
> Sure, that's why we went away from that logic. :)
>
>>
>>> With per-CPU dispatch, on a same system these 5 threads are mapped to 
>>> specific
>>> cores, 2 cores per thread.  If all the traffic comes on 4 cores, we may have
>>> 2 threads overloaded and 3 other threads do nothing.  In the same scenario 
>>> with
>>> per-vport dispatch we would have all 5 threads sharing the load.
>>
>> Yes, but with 256 we might have 250 doing nothing, and looking at some data 
>> from a
>> system with decent OpenFlow tables (1M+ entries, but only a couple of 
>> tables), we
>> can handle about 232k upcalls on a single handler thread for 64 bytes 
>> packets before
>> we run out of socket buffer. For 512bytes around 212k, and 1500 around 189k.
>
> Having idle threads in a system is generally not a problem.  As long as we
> don't have hundreds of thousands of them, they will not impact scheduling
> performance.
>
> Also, actual performance of handlers is hard to quantify, since OVS can be
> running on a large variety of different hardware.  Yours is likely on a
> powerful side of a spectrum.
>
>>
>> If we have more than 200k upcalls per second, we have other problems, 
>> especially as
>> the default dp flow size is 200k.
>>
>> So I guess the new nob becomes tunable for resource consumption vs upcall 
>> handling,
>> which I think is fine, and might be desirable in certain configurations. We 
>> have other
>> nobs in the system that require a decent amount of knowledge.
>
> Having more knobs just for the sake of having something configurable
> doesn't sound good to me.  We should try to not introduce ways to
> configure something that doesn't need to be configured.
>
> Sure, creating spare threads consumes memory.  However there are few
> things to consider here:
>
> 1. A system with 256 cores likely has a 512 MB of RAM to spare
>    (By default we limit to 2 MB of stack per thread).
>    I haven't seen any complaint from users about memory consumption
>    on large systems.
>
> 2. By reducing the number of threads we're working around the problem
>    of a large stack use during flow translation.  If we can reduce the
>    stack usage, we can reduce the overall memory consumption.
>    So, maybe that should be addressed instead of working around with
>    a hard-to-use-right knob.

I guess the problem is that various people have tried, and the stack size did 
get reduced, but not enough. The code is very complex and due to all the corner 
cases introduced over the years, it’s (almost) impossible to get the recursion 
removed, without a complete rewrite (trying to not break all these corner 
cases).

And in the past, we did receive complaints about the memory size. In some 
cases, we still need 8M, which is 2G on a 256-core system.

Unless someone finds a solution to this, I still feel we should have this 
option in case we need it.

//Eelco

> 3. We can probably limit the number of revalidators instead (without
>    a knob).  We may estimate the number of revalidators needed by the
>    datapath flow hard limit and not create more than that even if we
>    have more cores.  Should help with memory consumption as well.
>
>>
>>> So, in order for the user to make a decision on the number of threads, they
>>> have to know on which CPUs they most likely to have interrupts and how 
>>> exactly
>>> OVS will map handlers to CPU numbers.  That is too much for a user to know, 
>>> IMO.
>>> Some of that is actually impossible to know.  And when the traffic pattern 
>>> changes,
>>> the number of threads may need to be changed to game the mapping to avoid
>>> overload.
>>>
>>> Also, multiply that complexity by the CPU affinity of the ovs-vswitchd that
>>> changes the mapping.
>>
>> Well, using an affinity mask will mess up the above anyway as we will get 
>> fewer
>> threads, sharing potential CPU resources, etc. So making the n_threads 
>> configurable
>> will have the same problem as changing the affinity mask. But we might have 
>> other
>> benefits (see below).
>>
>>> AFAIU, The only way to achieve an even load without dynamic reconfiguration 
>>> is
>>> to have a thread per CPU core.
>>
>> Yes, only if we have no affinity mask, but at the cost of additional 
>> resources
>> (which we might not want).
>>
>>>> Maybe we should
>>>> add a new option, n-forced-handler-threads. This will avoid the above 
>>>> issue with
>>>> existing scripts.
>>>
>>> Users can limit the number of threads by setting CPU affinity for 
>>> ovs-vswitchd.
>>> If these users are knowledgeable enough to predict traffic patterns and CPU
>>> mappings, they can set appropriate CPU affinity masks as well.
>>
>> Well, you can not exactly set the number of handler threads as there will be
>> round-up to prime (which should not be a big problem). But the main problem 
>> I see
>> is that now you limit all OVS to that set of CPU. So a burst of upcall could 
>> mess
>> with your revalidator handling (or other threads running on that set).
>
> Should not be a problem.  We always run handlers and revalidators on the same
> cores by default.  And revalidators do not need millisecond precision.  Also
> revalidators can float between cores.  So, the use of the same cores should
> not be a problem as long as not all handlers are at 100% load, which will be
> a problem on its own regardless.
>
>>
>>> For others, changing the meaning of a knob underneath them would be kind
>>> of irresponsible from our side.
>>
>> Ack, I will introduce a new nob (also some changes to dpif-netlink, as the 
>> patch has a bug).
>>
>
> I'm still not convinced we need a knob for the reasons listed above.
>
> Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to