On 6/30/23 18:13, Ilya Maximets wrote:
> On 6/30/23 00:45, Mike Pattrick wrote:
>> On Thu, Jun 29, 2023 at 3:10 PM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 6/9/23 17:05, Mike Pattrick wrote:
>>>> Several xlate actions used in recursive translation currently store a
>>>> large amount of information on the stack. This can result in handler
>>>> threads quickly running out of stack space despite before
>>>> xlate_resubmit_resource_check() is able to terminate translation. This
>>>> patch reduces stack usage by over 3kb from several translation actions.
>>>
>>> I'm assuming that this number is mostly about patch_port_output function,
>>> right?  I see only ~30% stack usage reduction for the main recursive
>>> do_xlate_actions function.  It went from 720 to 512 bytes.
>>>
>>> Or am I calculating it wrong?  I'm using -fstack-size compiler option.
>>
>> I was using GDB for this, finding the difference in the stack pointer
>> between calls to the same function during recursion. The intention of
>> this patch is to get to xlate_resubmit_resource_check() without
>> smashing the stack, no individual functions stack usage is as
>> important as the holistic stack usage while recursing.
>>
>> For example, without patch:
>>
>> (gdb) fram 1
>> #1  0x000000000044dee3 in xlate_table_action (...) at
>> ofproto/ofproto-dpif-xlate.c:4539
>> 4539        if (xlate_resubmit_resource_check(ctx)) {
>> (gdb) p $rsp
>> $4 = (void *) 0x7f888e7aa700
>> (gdb) frame 8
>> #8  xlate_table_action (...) at ofproto/ofproto-dpif-xlate.c:4584
>> 4584                xlate_recursively(ctx, rule, table_id <= old_table_id,
>> (gdb) p $rsp
>> $5 = (void *) 0x7f888e7aba70
>> (gdb) p 0x7f888e7aba70-0x7f888e7aa700
>> $6 = 4976
>>
>> With patch:
>>
>> #1  0x000000000044e3bc in xlate_table_action (...) at
>> ofproto/ofproto-dpif-xlate.c:4552
>> 4552        if (xlate_resubmit_resource_check(ctx)) {
>> (gdb) p $rsp
>> $4 = (void *) 0x7f78e69a9860
>> (gdb) frame 9
>> #9  0x000000000044ec00 in xlate_table_action (...) at
>> ofproto/ofproto-dpif-xlate.c:7212
>> 7212                xlate_table_action(ctx, ctx->xin->flow.in_port.ofp_port,
>> (gdb) p $rsp
>> $5 = (void *) 0x7f78e69a9c60
>> (gdb) p 0x7f78e69a9c60-0x7f78e69a9860
>> $6 = 1024
>>
>>
>>>
>>>> This patch also moves some trace function from do_xlate_actions into its
>>>> own function to reduce stack usage.
>>>
>>> This doesn't seem to work.  With GCC and -O2, i.e. the standard optimization
>>> level, this new function is always inlined for me, and there is no real
>>> difference in the produced code and the stack usage.
>>
>> When I compile with O2, I get the following code generation:
>>
>> Without patch:
>> 000000000044cda0 <do_xlate_actions>:
>> [...]
>>   44cdad:       48 81 ec 78 02 00 00    sub    rsp,0x278
>>
>> With patch:
>> 000000000044cff0 <do_xlate_actions>:
>> [...]
>>   44cffd:       48 81 ec 98 01 00 00    sub    rsp,0x198
>>
>> Do you get different results?
> 
> If we're just looking at the first sub in the function, I get the same
> number (0x278) on current master and 0x148 with your patch applied.
> 
> Moving the port name map to heap reduces the value to 0x128.
> The xlate_trace function is fully inlined according to objdump.> 
>>
>>>
>>> In order to achieve actual reduction of stack usage there we need to move
>>> to dynamic allocation of the port name map.q
>>>
>>> There are few more functions that are fully or partially inlined into
>>> do_xlate_actions with default compiler options:
>>>
>>>  - xlate_output_reg_action: Allocates a large mf_subvalue on stack.
>>>  - xlate_sample_action: Allocates huge struct flow_tnl(!) on stack.
>>>  - compose_conntrack_action: Allocates mf_subvalue.
>>>  - xlate_check_pkt_larger: Allocates mf_subvalue as well.
>>>
>>> With these moved to heap, it should be possible to reduce the stack usage
>>> by about 60% for do_xlate_actions function.
>>
>> These don't inline for me on O2. What GCC version are you using?
> 
> If I do changes in the functions above, the number is dropped to 0xd8.
> 
> I'm building with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) on F38.
> CFLAGS="-g -O2".

Did a round of tests with clang version 16.0.4 (Fedora 16.0.4-1.fc38)
and the same CFLAGS.  It's not so aggressive with inlining, so it doesn't
inline the xlate_trace.  However, that doesn't affect the initial stack
allocation.

Master    : 0x258
With patch: 0x258  - No difference with master

With extra changes in functions listed above: 0x128

> 
> Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to