On 6/30/23 18:13, Ilya Maximets wrote: > On 6/30/23 00:45, Mike Pattrick wrote: >> On Thu, Jun 29, 2023 at 3:10 PM Ilya Maximets <[email protected]> wrote: >>> >>> On 6/9/23 17:05, Mike Pattrick wrote: >>>> Several xlate actions used in recursive translation currently store a >>>> large amount of information on the stack. This can result in handler >>>> threads quickly running out of stack space despite before >>>> xlate_resubmit_resource_check() is able to terminate translation. This >>>> patch reduces stack usage by over 3kb from several translation actions. >>> >>> I'm assuming that this number is mostly about patch_port_output function, >>> right? I see only ~30% stack usage reduction for the main recursive >>> do_xlate_actions function. It went from 720 to 512 bytes. >>> >>> Or am I calculating it wrong? I'm using -fstack-size compiler option. >> >> I was using GDB for this, finding the difference in the stack pointer >> between calls to the same function during recursion. The intention of >> this patch is to get to xlate_resubmit_resource_check() without >> smashing the stack, no individual functions stack usage is as >> important as the holistic stack usage while recursing. >> >> For example, without patch: >> >> (gdb) fram 1 >> #1 0x000000000044dee3 in xlate_table_action (...) at >> ofproto/ofproto-dpif-xlate.c:4539 >> 4539 if (xlate_resubmit_resource_check(ctx)) { >> (gdb) p $rsp >> $4 = (void *) 0x7f888e7aa700 >> (gdb) frame 8 >> #8 xlate_table_action (...) at ofproto/ofproto-dpif-xlate.c:4584 >> 4584 xlate_recursively(ctx, rule, table_id <= old_table_id, >> (gdb) p $rsp >> $5 = (void *) 0x7f888e7aba70 >> (gdb) p 0x7f888e7aba70-0x7f888e7aa700 >> $6 = 4976 >> >> With patch: >> >> #1 0x000000000044e3bc in xlate_table_action (...) at >> ofproto/ofproto-dpif-xlate.c:4552 >> 4552 if (xlate_resubmit_resource_check(ctx)) { >> (gdb) p $rsp >> $4 = (void *) 0x7f78e69a9860 >> (gdb) frame 9 >> #9 0x000000000044ec00 in xlate_table_action (...) at >> ofproto/ofproto-dpif-xlate.c:7212 >> 7212 xlate_table_action(ctx, ctx->xin->flow.in_port.ofp_port, >> (gdb) p $rsp >> $5 = (void *) 0x7f78e69a9c60 >> (gdb) p 0x7f78e69a9c60-0x7f78e69a9860 >> $6 = 1024 >> >> >>> >>>> This patch also moves some trace function from do_xlate_actions into its >>>> own function to reduce stack usage. >>> >>> This doesn't seem to work. With GCC and -O2, i.e. the standard optimization >>> level, this new function is always inlined for me, and there is no real >>> difference in the produced code and the stack usage. >> >> When I compile with O2, I get the following code generation: >> >> Without patch: >> 000000000044cda0 <do_xlate_actions>: >> [...] >> 44cdad: 48 81 ec 78 02 00 00 sub rsp,0x278 >> >> With patch: >> 000000000044cff0 <do_xlate_actions>: >> [...] >> 44cffd: 48 81 ec 98 01 00 00 sub rsp,0x198 >> >> Do you get different results? > > If we're just looking at the first sub in the function, I get the same > number (0x278) on current master and 0x148 with your patch applied. > > Moving the port name map to heap reduces the value to 0x128. > The xlate_trace function is fully inlined according to objdump.> >> >>> >>> In order to achieve actual reduction of stack usage there we need to move >>> to dynamic allocation of the port name map.q >>> >>> There are few more functions that are fully or partially inlined into >>> do_xlate_actions with default compiler options: >>> >>> - xlate_output_reg_action: Allocates a large mf_subvalue on stack. >>> - xlate_sample_action: Allocates huge struct flow_tnl(!) on stack. >>> - compose_conntrack_action: Allocates mf_subvalue. >>> - xlate_check_pkt_larger: Allocates mf_subvalue as well. >>> >>> With these moved to heap, it should be possible to reduce the stack usage >>> by about 60% for do_xlate_actions function. >> >> These don't inline for me on O2. What GCC version are you using? > > If I do changes in the functions above, the number is dropped to 0xd8. > > I'm building with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) on F38. > CFLAGS="-g -O2".
Did a round of tests with clang version 16.0.4 (Fedora 16.0.4-1.fc38) and the same CFLAGS. It's not so aggressive with inlining, so it doesn't inline the xlate_trace. However, that doesn't affect the initial stack allocation. Master : 0x258 With patch: 0x258 - No difference with master With extra changes in functions listed above: 0x128 > > Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
