On 10/20/22 00:13, Mike Pattrick wrote: > On Wed, Oct 19, 2022 at 12:27 PM Ilya Maximets <[email protected]> wrote: >> >> On 10/19/22 15:48, Mike Pattrick wrote: >>> On Wed, Oct 19, 2022 at 9:30 AM Flavio Leitner <[email protected]> wrote: >>>> >>>> >>>> Hi Mike, >>>> >>>> Thanks for the patch. >>>> >>>> Does this patch need to change this line too? >>>> https://github.com/openvswitch/ovs/blob/31db0e043119cf597d720d94f70ec19cf5b8b7d4/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in#L18 >>>> >>>> >>>> Wouldn't it be better to have a config option that we >>>> can change at runtime? Or perhaps leave it to use the >>>> system's default unless configured to cap the amount? >>> >>> What I'm worried about here is if OVN is used and a few other things >>> like dnat/snat, the resulting openflow rules can cause a segfault with >>> the system default 2MB stack. The stack conditions aren't really >>> detectable during runtime so increasing the default seemed reasonable >>> to me. It also doesn't seem trivial to me to determine if any given >>> set of OpenFlow rules will or won't cause an explosion in stack usage. >> >> Hi, Mike. >> >> I was thinking in the past about a bit different approach to the problem. >> I have a guess that most of the stack usage is coming from 'struct flow' >> and some of the stab[] memory allocations on the stack and these are huge. >> Can easily take a few KB of space each. >> So, I wonder, if you have a coredump or a reproducer, maybe you could >> confirm or disprove that theory? That would be very helpful > > I do have a coredump handy, the functions with struct flow and stub > like clone_xlate_actions() and patch_port_output() only account for > around 20% of the used stack space. do_xlate_actions() accounts for > over half of the stack used. I'll try reducing the stack used in these > three functions as an alternative to increasing the stack allocated.
OK. That makes sense. Thanks! do_xlate_actions() might be a bit tricky as the main stack usage is not really from that particular function, but from all the small functions that compiler inlines into it. gcc -O2 -fstack-usage: do_xlate_actions 720 dynamic,bounded gcc -O2 -fstack-usage -fno-inline: do_xlate_actions 288 dynamic,bounded Luckily, there are at least few low hanging fruits like a monster 'struct flow_tnl flow_tnl' in the xlate_sample_action(). Best regards, Ilya Maximets. > > > Cheers, > M > >> >> If the theory will turn out to be true, the solution might be to move >> all that to dynamically allocated memory instead of trying to guess the >> sufficient upper limit for a stack size. >> (Moving these to a heap might be a good idea even if they are not a >> root cause of the current problem, I think.) >> >> What do you think? >> >> Best regards, Ilya Maximets. >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
