On Fri, Jun 30, 2023 at 12:33 PM Ilya Maximets <[email protected]> wrote: > > On 6/30/23 18:13, Ilya Maximets wrote: > > On 6/30/23 00:45, Mike Pattrick wrote: > >> On Thu, Jun 29, 2023 at 3:10 PM Ilya Maximets <[email protected]> wrote: > >>> > >>> On 6/9/23 17:05, Mike Pattrick wrote: > >>>> Several xlate actions used in recursive translation currently store a > >>>> large amount of information on the stack. This can result in handler > >>>> threads quickly running out of stack space despite before > >>>> xlate_resubmit_resource_check() is able to terminate translation. This > >>>> patch reduces stack usage by over 3kb from several translation actions. > >>> > >>> I'm assuming that this number is mostly about patch_port_output function, > >>> right? I see only ~30% stack usage reduction for the main recursive > >>> do_xlate_actions function. It went from 720 to 512 bytes. > >>> > >>> Or am I calculating it wrong? I'm using -fstack-size compiler option. > >> > >> I was using GDB for this, finding the difference in the stack pointer > >> between calls to the same function during recursion. The intention of > >> this patch is to get to xlate_resubmit_resource_check() without > >> smashing the stack, no individual functions stack usage is as > >> important as the holistic stack usage while recursing. > >> > >> For example, without patch: > >> > >> (gdb) fram 1 > >> #1 0x000000000044dee3 in xlate_table_action (...) at > >> ofproto/ofproto-dpif-xlate.c:4539 > >> 4539 if (xlate_resubmit_resource_check(ctx)) { > >> (gdb) p $rsp > >> $4 = (void *) 0x7f888e7aa700 > >> (gdb) frame 8 > >> #8 xlate_table_action (...) at ofproto/ofproto-dpif-xlate.c:4584 > >> 4584 xlate_recursively(ctx, rule, table_id <= old_table_id, > >> (gdb) p $rsp > >> $5 = (void *) 0x7f888e7aba70 > >> (gdb) p 0x7f888e7aba70-0x7f888e7aa700 > >> $6 = 4976 > >> > >> With patch: > >> > >> #1 0x000000000044e3bc in xlate_table_action (...) at > >> ofproto/ofproto-dpif-xlate.c:4552 > >> 4552 if (xlate_resubmit_resource_check(ctx)) { > >> (gdb) p $rsp > >> $4 = (void *) 0x7f78e69a9860 > >> (gdb) frame 9 > >> #9 0x000000000044ec00 in xlate_table_action (...) at > >> ofproto/ofproto-dpif-xlate.c:7212 > >> 7212 xlate_table_action(ctx, > >> ctx->xin->flow.in_port.ofp_port, > >> (gdb) p $rsp > >> $5 = (void *) 0x7f78e69a9c60 > >> (gdb) p 0x7f78e69a9c60-0x7f78e69a9860 > >> $6 = 1024 > >> > >> > >>> > >>>> This patch also moves some trace function from do_xlate_actions into its > >>>> own function to reduce stack usage. > >>> > >>> This doesn't seem to work. With GCC and -O2, i.e. the standard > >>> optimization > >>> level, this new function is always inlined for me, and there is no real > >>> difference in the produced code and the stack usage. > >> > >> When I compile with O2, I get the following code generation: > >> > >> Without patch: > >> 000000000044cda0 <do_xlate_actions>: > >> [...] > >> 44cdad: 48 81 ec 78 02 00 00 sub rsp,0x278 > >> > >> With patch: > >> 000000000044cff0 <do_xlate_actions>: > >> [...] > >> 44cffd: 48 81 ec 98 01 00 00 sub rsp,0x198 > >> > >> Do you get different results? > > > > If we're just looking at the first sub in the function, I get the same > > number (0x278) on current master and 0x148 with your patch applied. > > > > Moving the port name map to heap reduces the value to 0x128. > > The xlate_trace function is fully inlined according to objdump.> > >> > >>> > >>> In order to achieve actual reduction of stack usage there we need to move > >>> to dynamic allocation of the port name map.q > >>> > >>> There are few more functions that are fully or partially inlined into > >>> do_xlate_actions with default compiler options: > >>> > >>> - xlate_output_reg_action: Allocates a large mf_subvalue on stack. > >>> - xlate_sample_action: Allocates huge struct flow_tnl(!) on stack. > >>> - compose_conntrack_action: Allocates mf_subvalue. > >>> - xlate_check_pkt_larger: Allocates mf_subvalue as well. > >>> > >>> With these moved to heap, it should be possible to reduce the stack usage > >>> by about 60% for do_xlate_actions function. > >> > >> These don't inline for me on O2. What GCC version are you using? > > > > If I do changes in the functions above, the number is dropped to 0xd8. > > > > I'm building with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) on F38. > > CFLAGS="-g -O2". > > Did a round of tests with clang version 16.0.4 (Fedora 16.0.4-1.fc38) > and the same CFLAGS. It's not so aggressive with inlining, so it doesn't > inline the xlate_trace. However, that doesn't affect the initial stack > allocation. > > Master : 0x258 > With patch: 0x258 - No difference with master
That isn't fantastic! I've confirmed the same behaviour with clang. I'll send in a new version. -M > > With extra changes in functions listed above: 0x128 > > > > > Best regards, Ilya Maximets. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
