Re: [PATCH v4] aarch64: Add split-stack initial support

2017-03-06 Thread Adhemerval Zanella
Ping.

On 15/02/2017 21:23, Adhemerval Zanella wrote:
> This is an update patch from my previous version (v3) based mainly on
> glibc comments on exported loader symbol [1].  The changes from previous
> version are:
> 
>- Remove __tcb_private_ss call on libgcc and emit a data usage on
>  glibc symbol when split-stack is used.  This removes the runtime
>  errors when running on older glibc and instead make the loader
>  fails with a missing symbol.
> 
>- Add glibc version check on split-stack support check.
> 
>- Some comments fixes on morestack.S.
> 
>- Remove some compile warnings on morestack-c.c.
> 
> --
> 
> This patch adds the split-stack support on aarch64 (PR #67877).  As for
> other ports this patch should be used along with glibc and gold support.
> 
> The support is done similar to other architectures: a split-stack field
> is allocated before TCB by glibc, a target-specific __morestack implementation
> and helper functions are added in libgcc and compiler supported in adjusted
> (split-stack prologue, va_start for argument handling).  I also plan to
> send the gold support to adjust stack allocation acrosss split-stack
> and default code calls.
> 
> Current approach is to set the final stack adjustments using a 2 instructions
> at most (mov/movk) which limits stack allocation to upper limit of 4GB.
> The morestack call is non standard with x10 hollding the requested stack
> pointer, x11 the argument pointer (if required), and x12 to return
> continuation address.  Unwinding is handled by a personality routine that
> knows how to find stack segments.
> 
> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):
> 
> function:
>   mrsx9, tpidr_el0
>   movx10, 
>   movk   x10, #0x0, lsl #16
>   subx10, sp, x10
>   movx11, sp  # if function has stacked arguments
>   adrp   x12, main_fn_entry
>   addx12, x12, :lo12:.L2
>   cmpx9, x10
>   b.lt   
>   b  __morestack
> main_fn_entry:
>   [function prologue]
> 
> Notes:
> 
> 1. Even if a function does not allocate a stack frame, a split-stack prologue
>is created.  It is to avoid issues with tail call for external symbols
>which might require linker adjustment (libgo/runtime/go-varargs.c).
> 
> 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
>to after the required stack calculation.
> 
> 3. Similar to powerpc, When the linker detects a call from split-stack to
>non-split-stack code, it adds 16k (or more) to the value found in 
> "allocate"
>instructions (so non-split-stack code gets a larger stack).  The amount is
>tunable by a linker option.  The edit means aarch64 does not need to
>implement __morestack_non_split, necessary on x86 because insufficient
>space is available there to edit the stack comparison code.  This feature
>is only implemented in the GNU gold linker.
> 
> 4. AArch64 does not handle >4G stack initially and although it is possible
>to implement it, limiting to 4G allows to materize the allocation with
>only 2 instructions (mov + movk) and thus simplifying the linker
>adjustments required.  Supporting multiple threads each requiring more
>than 4G of stack is probably not that important, and likely to OOM at
>run time.
> 
> 5. The TCB support on GLIBC is meant to be included in version 2.26.
> 
> 6. The continuation address materialized on x12 is done using 'adrp'
>plus add and a static relocation.  Current code uses the
>aarch64_expand_mov_immediate function and since a better alternative
>would be 'adp', it could be a future optimization (not implemented
>in this patch).
> 
> [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html
> 
> libgcc/ChangeLog:
> 
>   * libgcc/config.host: Use t-stack and t-statck-aarch64 for
>   aarch64*-*-linux.
>   * libgcc/config/aarch64/morestack-c.c: New file.
>   * libgcc/config/aarch64/morestack.S: Likewise.
>   * libgcc/config/aarch64/t-stack-aarch64: Likewise.
>   * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
>   code.
> 
> gcc/ChangeLog:
> 
>   * common/config/aarch64/aarch64-common.c
>   (aarch64_supports_split_stack): New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
>   macro.
>   * gcc/config/aarch64/aarch64-protos.h: Add
>   aarch64_expand_split_stack_prologue and
>   aarch64_split_stack_space_check.
>   * gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport
>   to emit 'b' instruction to rtx different than LABEL_REF.
>   (aarch64_expand_builtin_va_start): Use internal argument pointer
>   instead of virtual_incoming_args_rtx.
>   (morestack_ref): New symbol.
>   (aarch64_load_split_stack_value): New function.
>   

[PATCH v4] aarch64: Add split-stack initial support

2017-02-15 Thread Adhemerval Zanella
This is an update patch from my previous version (v3) based mainly on
glibc comments on exported loader symbol [1].  The changes from previous
version are:

   - Remove __tcb_private_ss call on libgcc and emit a data usage on
 glibc symbol when split-stack is used.  This removes the runtime
 errors when running on older glibc and instead make the loader
 fails with a missing symbol.

   - Add glibc version check on split-stack support check.

   - Some comments fixes on morestack.S.

   - Remove some compile warnings on morestack-c.c.

--

This patch adds the split-stack support on aarch64 (PR #67877).  As for
other ports this patch should be used along with glibc and gold support.

The support is done similar to other architectures: a split-stack field
is allocated before TCB by glibc, a target-specific __morestack implementation
and helper functions are added in libgcc and compiler supported in adjusted
(split-stack prologue, va_start for argument handling).  I also plan to
send the gold support to adjust stack allocation acrosss split-stack
and default code calls.

Current approach is to set the final stack adjustments using a 2 instructions
at most (mov/movk) which limits stack allocation to upper limit of 4GB.
The morestack call is non standard with x10 hollding the requested stack
pointer, x11 the argument pointer (if required), and x12 to return
continuation address.  Unwinding is handled by a personality routine that
knows how to find stack segments.

Split-stack prologue on function entry is as follow (this goes before the
usual function prologue):

function:
mrsx9, tpidr_el0
movx10, 
movk   x10, #0x0, lsl #16
subx10, sp, x10
movx11, sp  # if function has stacked arguments
adrp   x12, main_fn_entry
addx12, x12, :lo12:.L2
cmpx9, x10
b.lt   
b  __morestack
main_fn_entry:
[function prologue]

Notes:

1. Even if a function does not allocate a stack frame, a split-stack prologue
   is created.  It is to avoid issues with tail call for external symbols
   which might require linker adjustment (libgo/runtime/go-varargs.c).

2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
   to after the required stack calculation.

3. Similar to powerpc, When the linker detects a call from split-stack to
   non-split-stack code, it adds 16k (or more) to the value found in "allocate"
   instructions (so non-split-stack code gets a larger stack).  The amount is
   tunable by a linker option.  The edit means aarch64 does not need to
   implement __morestack_non_split, necessary on x86 because insufficient
   space is available there to edit the stack comparison code.  This feature
   is only implemented in the GNU gold linker.

4. AArch64 does not handle >4G stack initially and although it is possible
   to implement it, limiting to 4G allows to materize the allocation with
   only 2 instructions (mov + movk) and thus simplifying the linker
   adjustments required.  Supporting multiple threads each requiring more
   than 4G of stack is probably not that important, and likely to OOM at
   run time.

5. The TCB support on GLIBC is meant to be included in version 2.26.

6. The continuation address materialized on x12 is done using 'adrp'
   plus add and a static relocation.  Current code uses the
   aarch64_expand_mov_immediate function and since a better alternative
   would be 'adp', it could be a future optimization (not implemented
   in this patch).

[1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html

libgcc/ChangeLog:

* libgcc/config.host: Use t-stack and t-statck-aarch64 for
aarch64*-*-linux.
* libgcc/config/aarch64/morestack-c.c: New file.
* libgcc/config/aarch64/morestack.S: Likewise.
* libgcc/config/aarch64/t-stack-aarch64: Likewise.
* libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
code.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c
(aarch64_supports_split_stack): New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
macro.
* gcc/config/aarch64/aarch64-protos.h: Add
aarch64_expand_split_stack_prologue and
aarch64_split_stack_space_check.
* gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport
to emit 'b' instruction to rtx different than LABEL_REF.
(aarch64_expand_builtin_va_start): Use internal argument pointer
instead of virtual_incoming_args_rtx.
(morestack_ref): New symbol.
(aarch64_load_split_stack_value): New function.
(aarch64_expand_split_stack_prologue): Likewise.
(aarch64_internal_arg_pointer): Likewise.
(aarch64_split_stack_space_check): Likewise.
(aarch64_file_end): Emit the split-stack note sections.