On 10/20/22 00:13, Mike Pattrick wrote:
> On Wed, Oct 19, 2022 at 12:27 PM Ilya Maximets <[email protected]> wrote:
>>
>> On 10/19/22 15:48, Mike Pattrick wrote:
>>> On Wed, Oct 19, 2022 at 9:30 AM Flavio Leitner <[email protected]> wrote:
>>>>
>>>>
>>>> Hi Mike,
>>>>
>>>> Thanks for the patch.
>>>>
>>>> Does this patch need to change this line too?
>>>> https://github.com/openvswitch/ovs/blob/31db0e043119cf597d720d94f70ec19cf5b8b7d4/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in#L18
>>>>
>>>>
>>>> Wouldn't it be better to have a config option that we
>>>> can change at runtime? Or perhaps leave it to use the
>>>> system's default unless configured to cap the amount?
>>>
>>> What I'm worried about here is if OVN is used and a few other things
>>> like dnat/snat, the resulting openflow rules can cause a segfault with
>>> the system default 2MB stack. The stack conditions aren't really
>>> detectable during runtime so increasing the default seemed reasonable
>>> to me. It also doesn't seem trivial to me to determine if any given
>>> set of OpenFlow rules will or won't cause an explosion in stack usage.
>>
>> Hi, Mike.
>>
>> I was thinking in the past about a bit different approach to the problem.
>> I have a guess that most of the stack usage is coming from 'struct flow'
>> and some of the stab[] memory allocations on the stack and these are huge.
>> Can easily take a few KB of space each.
>> So, I wonder, if you have a coredump or a reproducer, maybe you could
>> confirm or disprove that theory?  That would be very helpful
> 
> I do have a coredump handy, the functions with struct flow and stub
> like clone_xlate_actions() and patch_port_output() only account for
> around 20% of the used stack space. do_xlate_actions() accounts for
> over half of the stack used. I'll try reducing the stack used in these
> three functions as an alternative to increasing the stack allocated.

OK.  That makes sense.  Thanks!

do_xlate_actions() might be a bit tricky as the main stack usage is not
really from that particular function, but from all the small functions
that compiler inlines into it.

gcc -O2 -fstack-usage:
do_xlate_actions  720  dynamic,bounded

gcc -O2 -fstack-usage -fno-inline:
do_xlate_actions  288 dynamic,bounded


Luckily, there are at least few low hanging fruits like a monster
'struct flow_tnl flow_tnl' in the xlate_sample_action().

Best regards, Ilya Maximets.

> 
> 
> Cheers,
> M
> 
>>
>> If the theory will turn out to be true, the solution might be to move
>> all that to dynamically allocated memory instead of trying to guess the
>> sufficient upper limit for a stack size.
>> (Moving these to a heap might be a good idea even if they are not a
>> root cause of the current problem, I think.)
>>
>> What do you think?
>>
>> Best regards, Ilya Maximets.
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to