Re: [PATCH v4] aarch64: Add split-stack initial support
Ping. On 15/02/2017 21:23, Adhemerval Zanella wrote: > This is an update patch from my previous version (v3) based mainly on > glibc comments on exported loader symbol [1]. The changes from previous > version are: > >- Remove __tcb_private_ss call on libgcc and emit a data usage on > glibc symbol when split-stack is used. This removes the runtime > errors when running on older glibc and instead make the loader > fails with a missing symbol. > >- Add glibc version check on split-stack support check. > >- Some comments fixes on morestack.S. > >- Remove some compile warnings on morestack-c.c. > > -- > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a split-stack field > is allocated before TCB by glibc, a target-specific __morestack implementation > and helper functions are added in libgcc and compiler supported in adjusted > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is to set the final stack adjustments using a 2 instructions > at most (mov/movk) which limits stack allocation to upper limit of 4GB. > The morestack call is non standard with x10 hollding the requested stack > pointer, x11 the argument pointer (if required), and x12 to return > continuation address. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > function: > mrsx9, tpidr_el0 > movx10, > movk x10, #0x0, lsl #16 > subx10, sp, x10 > movx11, sp # if function has stacked arguments > adrp x12, main_fn_entry > addx12, x12, :lo12:.L2 > cmpx9, x10 > b.lt > b __morestack > main_fn_entry: > [function prologue] > > Notes: > > 1. Even if a function does not allocate a stack frame, a split-stack prologue >is created. It is to avoid issues with tail call for external symbols >which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr >to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to >non-split-stack code, it adds 16k (or more) to the value found in > "allocate" >instructions (so non-split-stack code gets a larger stack). The amount is >tunable by a linker option. The edit means aarch64 does not need to >implement __morestack_non_split, necessary on x86 because insufficient >space is available there to edit the stack comparison code. This feature >is only implemented in the GNU gold linker. > > 4. AArch64 does not handle >4G stack initially and although it is possible >to implement it, limiting to 4G allows to materize the allocation with >only 2 instructions (mov + movk) and thus simplifying the linker >adjustments required. Supporting multiple threads each requiring more >than 4G of stack is probably not that important, and likely to OOM at >run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.26. > > 6. The continuation address materialized on x12 is done using 'adrp' >plus add and a static relocation. Current code uses the >aarch64_expand_mov_immediate function and since a better alternative >would be 'adp', it could be a future optimization (not implemented >in this patch). > > [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove > macro. > * gcc/config/aarch64/aarch64-protos.h: Add > aarch64_expand_split_stack_prologue and > aarch64_split_stack_space_check. > * gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport > to emit 'b' instruction to rtx different than LABEL_REF. > (aarch64_expand_builtin_va_start): Use internal argument pointer > instead of virtual_incoming_args_rtx. > (morestack_ref): New symbol. > (aarch64_load_split_stack_value): New function. >
[PATCH v4] aarch64: Add split-stack initial support
This is an update patch from my previous version (v3) based mainly on glibc comments on exported loader symbol [1]. The changes from previous version are: - Remove __tcb_private_ss call on libgcc and emit a data usage on glibc symbol when split-stack is used. This removes the runtime errors when running on older glibc and instead make the loader fails with a missing symbol. - Add glibc version check on split-stack support check. - Some comments fixes on morestack.S. - Remove some compile warnings on morestack-c.c. -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a split-stack field is allocated before TCB by glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is to set the final stack adjustments using a 2 instructions at most (mov/movk) which limits stack allocation to upper limit of 4GB. The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer (if required), and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 movx10, movk x10, #0x0, lsl #16 subx10, sp, x10 movx11, sp # if function has stacked arguments adrp x12, main_fn_entry addx12, x12, :lo12:.L2 cmpx9, x10 b.lt b __morestack main_fn_entry: [function prologue] Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 4G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.26. 6. The continuation address materialized on x12 is done using 'adrp' plus add and a static relocation. Current code uses the aarch64_expand_mov_immediate function and since a better alternative would be 'adp', it could be a future optimization (not implemented in this patch). [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport to emit 'b' instruction to rtx different than LABEL_REF. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_load_split_stack_value): New function. (aarch64_expand_split_stack_prologue): Likewise. (aarch64_internal_arg_pointer): Likewise. (aarch64_split_stack_space_check): Likewise. (aarch64_file_end): Emit the split-stack note sections.