Re: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-09 Thread Maxim Blinov
Yes, those tests that triggered the ICE now pass.

Maxim


On Thu, 9 Nov 2023 at 16:26, Jeff Law  wrote:

>
>
> On 11/6/23 06:01, Maxim Blinov wrote:
> > From: Maxim Blinov 
> >
> > This patch is based on and intended for the
> vendors/riscv/gcc-13-with-riscv-opts branch - please apply if looks OK.
> >
> > Fixes the following ICEs that I'm seeing:
> >
> > FAIL: gcc.dg/vect/O3-pr49087.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-1.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-2.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-3.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-4.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/slp-50.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/slp-50.c -flto -ffat-lto-objects (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-cond-13.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-cond-13.c -flto -ffat-lto-objects (internal
> compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal
> compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.target/riscv/rvv/autovec/partial/live-2.c (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> >
> > -- >8 --
> >
> > When we create a VEC_EXPAND gimple stmt:
> >
> >/* SCALAR_RES = VEC_EXTRACT .  */
> >tree scalar_res
> >  = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
> (vectype),
> >  vec_lhs_phi, last_index);
> >
> > Under the hood we are really just creating a GIMPLE_CALL stmt. Later
> > on, when we `gsi_insert_seq_before` our stmts:
> >
> >if (stmts)
> >  {
> >gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
> >gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> >
> > We eventually run into tree-ssa-operands.cc:1147:
> >
> >operands_scanner (fn, stmt).build_ssa_operands ();
> >
> > Since VEC_EXPAND is *not* marked with ECF_NOVOPS, ECF_CONST, or
> > ECF_PURE flags in internal-fn.def, when
> > `operand_scanner::parse_ssa_operands` comes across our
> > VEC_EXTRACT-type GIMPLE_CALL, it generates a `gimple_vop()` artificial
> > variable.
> >
> > `operand_scanner::finalize_ssa_defs` then picks this up, so our final
> > stmt goes from
> >
> > _73 = .VEC_EXTRACT (vect_last_9.56_71, _72);
> >
> > to
> >
> > # .MEM = VDEF <>
> > _73 = .VEC_EXTRACT (vect_last_9.56_71, _72);
> >
> > But more importantly it marks us as `ssa_renaming_needed`, in
> > tree-ssa-operands.cc:420:
> >
> >/* If we have a non-SSA_NAME VDEF, mark it for renaming.  */
> >if (gimple_vdef (stmt)
> >&& TREE_CODE (gimple_vdef (stmt)) != SSA_NAME)
> >  {
> >fn->gimple_df->rename_vops = 1;
> >fn->gimple_df->ssa_renaming_needed = 1;
> >  }
> >
> > This then proceeds to crash the compiler when we are about to leave
> > `vect_transform_loops`:
> >
> >if (need_ssa_update_p (cfun))
> >  {
> >gcc_assert (loop_vinfo->any_known_not_updated_vssa);
> >fun->gimple_df->ssa_renaming_needed = false;
> >todo |= TODO_update_ssa_only_virtuals;
> >  }
> >
> > Since,
> >
> > - `need_ssa_update_p (cfun)` is true (it was set when we generated a
> >memory vdef)
> > - `loop_vinfo->any_known_not_updated_vssa` is false
> >
> > As the code currently stands, creating a gimple stmt containing a
> > VEC_EXTRACT should always generate a memory vdef, therefore we should
> > remember to mark `loop_vinfo->any_known_not_updated_vssa` afterwards.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vectorizable_live_operation): Remember to
> >   assert loop_vinfo->any_known_not_updated_vssa if we are inserting
> >   a call to VEC_EXPAND.
> Just to avoid any doubt -- with the internal-fn.def patch I cherry
> picked earlier this week to the branch, this is no longer needed, right?
>
> jeff
>


Re: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-06 Thread Maxim Blinov
On Mon, 6 Nov 2023 at 13:07, Richard Biener  wrote:
> I see
>
> DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
>vec_extract, vec_extract)
>
> ?

Oh, you're right! I should have checked the master branch first... and
I was even wondering why it wasn't marked as such. Should perhaps
cherry pick this for gcc-13-with-riscv-opts?


[PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-06 Thread Maxim Blinov
From: Maxim Blinov 

This patch is based on and intended for the 
vendors/riscv/gcc-13-with-riscv-opts branch - please apply if looks OK.

Fixes the following ICEs that I'm seeing:

FAIL: gcc.dg/vect/O3-pr49087.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-1.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-2.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-3.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-4.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in vect_transform_loops, 
at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/slp-50.c (internal compiler error: in vect_transform_loops, 
at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/slp-50.c -flto -ffat-lto-objects (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-cond-13.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-cond-13.c -flto -ffat-lto-objects (internal compiler 
error: in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal compiler 
error: in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.target/riscv/rvv/autovec/partial/live-2.c (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)

-- >8 --

When we create a VEC_EXPAND gimple stmt:

  /* SCALAR_RES = VEC_EXTRACT .  */
  tree scalar_res
= gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
vec_lhs_phi, last_index);

Under the hood we are really just creating a GIMPLE_CALL stmt. Later
on, when we `gsi_insert_seq_before` our stmts:

  if (stmts)
{
  gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);

We eventually run into tree-ssa-operands.cc:1147:

  operands_scanner (fn, stmt).build_ssa_operands ();

Since VEC_EXPAND is *not* marked with ECF_NOVOPS, ECF_CONST, or
ECF_PURE flags in internal-fn.def, when
`operand_scanner::parse_ssa_operands` comes across our
VEC_EXTRACT-type GIMPLE_CALL, it generates a `gimple_vop()` artificial
variable.

`operand_scanner::finalize_ssa_defs` then picks this up, so our final
stmt goes from

_73 = .VEC_EXTRACT (vect_last_9.56_71, _72);

to

# .MEM = VDEF <>
_73 = .VEC_EXTRACT (vect_last_9.56_71, _72);

But more importantly it marks us as `ssa_renaming_needed`, in
tree-ssa-operands.cc:420:

  /* If we have a non-SSA_NAME VDEF, mark it for renaming.  */
  if (gimple_vdef (stmt)
  && TREE_CODE (gimple_vdef (stmt)) != SSA_NAME)
{
  fn->gimple_df->rename_vops = 1;
  fn->gimple_df->ssa_renaming_needed = 1;
}

This then proceeds to crash the compiler when we are about to leave
`vect_transform_loops`:

  if (need_ssa_update_p (cfun))
{
  gcc_assert (loop_vinfo->any_known_not_updated_vssa);
  fun->gimple_df->ssa_renaming_needed = false;
  todo |= TODO_update_ssa_only_virtuals;
}

Since,

- `need_ssa_update_p (cfun)` is true (it was set when we generated a
  memory vdef)
- `loop_vinfo->any_known_not_updated_vssa` is false

As the code currently stands, creating a gimple stmt containing a
VEC_EXTRACT should always generate a memory vdef, therefore we should
remember to mark `loop_vinfo->any_known_not_updated_vssa` afterwards.

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_live_operation): Remember to
assert loop_vinfo->any_known_not_updated_vssa if we are inserting
a call to VEC_EXPAND.
---
 gcc/tree-vect-loop.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c8df2c88575..53c3a31d2a8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10155,6 +10155,11 @@ vectorizable_live_operation (vec_info *vinfo,
= gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
vec_lhs_phi, last_index);
 
+ /* We've expanded SSA at this point, and since VEC_EXTRACT
+will generate vops, make sure to tell GCC that we need to
+update SSA.  */
+ loop_vinfo->any_known_not_updated_vssa = true;
+
  /* Convert the extracted vector 

Re: [PATCH 0/4] Add aarch64-darwin support for off-stack trampolines

2021-11-22 Thread Maxim Blinov
Hi all, apologies for forgetting to add the cover letter.

The motivation of this work is to provide (limited) support for GCC
nested function trampolines on targets that do not have an executable
stack. This code has been (roughly) tested by creating several
thousand nested functions (i.e. enough to force allocation of a new
page), making sure all the nested functions execute correctly, and
consequently returning back up and ensuring that the pages are freed
when there are no more active trampolines in them.

I was provided the initial design and prototype implementation by
Andrew Burgess, and have since refactored the code to support
allocation/deallocation of trampoline pages, and added AArch64
Linux/Darwin support.

One of the limitations of the implementation in its current state is
the inability to track longjmps. There has been some discussion about
instrumenting calls to setjmp/longjmp so that the state of trampolines
is correctly tracked and freed when necessary, however that hasn't
been worked on yet.

On Sat, 13 Nov 2021 at 09:45, Maxim Blinov  wrote:
>
> Note: This patch is not yet ready for trunk as its dependent on some
> patches that are not-yet-upstream, however it serves as motivation for
> the previous patch(es) which are independent.
>
> 
>
> Implement the __builtin_nested_func_ptr_{created,deleted} functions
> for the aarch64-darwin platform. For this platform
> --enable-off-stack-trampolines is enabled by default, and
> -foff-stack-trampolines is enabled by default if the host MacOS
> operating system version is 11.x or greater.
>
> Co-authored-by: Andrew Burgess 
>
> libgcc/ChangeLog:
>
> * config/aarch64/heap-trampoline.c (allocate_trampoline_page):
> Request for MAP_JIT in the case of __APPLE__.
> Provide __APPLE__ variant of aarch64_trampoline_insns that uses
> x16 as the chain pointer.
> (__builtin_nested_func_ptr_created): Call
> pthread_jit_write_protect_np() to toggle read/write permission on
> page.
> * config.host (aarch64*-*darwin* | arm64*-*darwin*): Handle
> --enable-off-stack-trampolines.
> * configure.ac (--enable-off-stack-trampolines): Permit setting
> for target aarch64*-*darwin* | arm64*-*darwin*, and set default to
> enabled.
> * configure: Regenerate.
> ---
>  gcc/config.gcc  |  7 +
>  libgcc/config.host  |  4 +++
>  libgcc/config/aarch64/heap-trampoline.c | 36 +
>  libgcc/configure|  6 +
>  libgcc/configure.ac |  6 +
>  5 files changed, 59 insertions(+)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 031be563c5d..c13f7629d44 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1072,6 +1072,13 @@ esac
>
>  # Figure out if we need to enable -foff-stack-trampolines by default.
>  case ${target} in
> +aarch64*-*darwin* | arm64*-*darwin*)
> +  if test ${macos_maj} = 11 || test ${macos_maj} = 12; then
> +tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=1"
> +  else
> +tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
> +  fi
> +  ;;
>  *)
>tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
>;;
> diff --git a/libgcc/config.host b/libgcc/config.host
> index d1a491d27e7..3c536b0928a 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -414,6 +414,10 @@ aarch64*-*darwin* | arm64*-*darwin* )
> tmake_file="${tmake_file} t-crtfm"
> # No soft float for now because our long double is DF not TF.
> md_unwind_header=aarch64/aarch64-unwind.h
> +   if test x$off_stack_trampolines = xyes; then
> +   extra_parts="$extra_parts heap-trampoline.o"
> +   tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
> +   fi
> ;;
>  aarch64*-*-freebsd*)
> extra_parts="$extra_parts crtfastmath.o"
> diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> b/libgcc/config/aarch64/heap-trampoline.c
> index 721a2bed400..6994602beaf 100644
> --- a/libgcc/config/aarch64/heap-trampoline.c
> +++ b/libgcc/config/aarch64/heap-trampoline.c
> @@ -5,6 +5,9 @@
>  #include 
>  #include 
>
> +/* For pthread_jit_write_protect_np */
> +#include 
> +
>  void *allocate_trampoline_page (void);
>  int get_trampolines_per_page (void);
>  struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
> @@ -43,8 +46,15 @@ allocate_trampoline_page (void)
>  {
>void *page;
>
> +#if defined(__gnu_linux__)
>page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
>MAP_ANON | MAP_PRIVATE, 0, 0);
> +

Re: [PATCH 1/2] Add cumulative_args_t variants of TARGET_FUNCTION_ROUND_BOUNDARY and friends

2021-11-22 Thread Maxim Blinov
Hi Richard,

The purpose of this patch is to give more of the target code access to
the cumulative_args_t structure in order to enable certain (otherwise
currently impossible) stack layout constraints, specifically for
parameters that are passed over the stack. For example, there are some
targets (not yet in GCC trunk) which explicitly require the
distinction between named and variadic parameters in order to enforce
different alignment rules (when passing over the stack.) Such a
constraint would be accommodated by this patch.

The patch itself is very straightforward and simply adds the parameter
to the two functions which we'd like to extend. The motivation of
defining new target hooks was to minimize the patch size.

A concrete user of this change that we have in mind is the AArch64
Darwin parameter passing abi, which mandates that when passing on the
stack, named parameters are to be naturally-aligned, however variadic
ones are to be word-aligned. However this patch is completely generic
in nature and should enable all targets to have more control over
their parameter layout process.

Best Regards,
Maxim

On Mon, 15 Nov 2021 at 07:11, Richard Biener  wrote:
>
> On Sat, Nov 13, 2021 at 10:43 AM Maxim Blinov  
> wrote:
> >
> > The two target hooks responsible for informing GCC about stack
> > parameter alignment are `TARGET_FUNCTION_ARG_BOUNDARY` and
> > `TARGET_FUNCTION_ARG_ROUND_BOUNDARY`, which currently only consider
> > the tree and machine_mode of a specific given argument.
> >
> > Create two new target hooks suffixed with '_CA', and pass in a third
> > `cumulative_args_t` parameter. This enables the backend to make
> > alignment decisions based on the context of the whole function rather
> > than individual parameters.
> >
> > The orignal machine_mode/tree type macros are not removed - they are
> > called by the default implementations of `TARGET_...BOUNDARY_CA` and
> > `TARGET_...ROUND_BOUNDARY_CA`. This is done with the intetnion of
> > avoiding large mechanical modifications of nearly every backend in
> > GCC. There is also a new flag, -fstack-use-cumulative-args, which
> > provides a way to completely bypass the new `..._CA` macros. This
> > feature is intended for debugging GCC itself.
>
> Just two quick comments without looking at the patch.
>
> Please do not introduce options in the user namespace -f... which are
> for debugging only.  I think you should go without this part instead.
>
> Second, you fail to motivate the change.  I cannot make sense of
> "This enables the backend to make alignment decisions based on the
> context of the whole function rather than individual parameters."
>
> Richard.
>
> >
> > gcc/ChangeLog:
> >
> > * calls.c (initialize_argument_information): Pass `args_so_far`.
> > * common.opt: New flag `-fstack-use-cumulative-args`.
> > * config.gcc: No platforms currently use ..._CA-hooks: Set
> > -fstack-use-cumulative-args to be off by default.
> > * target.h (cumulative_args_t): Move declaration from here, to...
> > * cumulative-args.h (cumulative_args_t): ...this new file. This is
> > to permit backends to include the declaration of cumulative_args_t
> > without dragging in circular dependencies.
> > * function.c (assign_parm_find_entry_rtl): Provide
> > cumulative_args_t to locate_and_pad_parm.
> > (gimplify_parameters): Ditto.
> > (locate_and_pad_parm): Conditionally call new hooks if we're
> > invoked with -fstack-use-cumulative-args.
> > * function.h: Include cumulative-args.h.
> > (locate_and_pad_parm): Add cumulative_args_t parameter.
> > * target.def (function_arg_boundary_ca): Add.
> > (function_arg_round_boundary_ca): Ditto.
> > * targhooks.c (default_function_arg_boundary_ca): Implement.
> > (default_function_arg_round_boundary_ca): Ditto.
> > * targhooks.h (default_function_arg_boundary_ca): Declare.
> > (default_function_arg_round_boundary_ca): Ditto.
> > * doc/invoke.texi (-fstack-use-cumulative-args): Document.
> > * doc/tm.texi: Regenerate.
> > * doc/tm.texi.in: Ditto.
> > ---
> >  gcc/calls.c   |  3 +++
> >  gcc/common.opt|  4 
> >  gcc/config.gcc|  7 +++
> >  gcc/cumulative-args.h | 20 
> >  gcc/doc/invoke.texi   | 12 
> >  gcc/doc/tm.texi   | 20 
> >  gcc/doc/tm.texi.in|  4 
> >  gcc/function.c| 25 +
> >  gcc/function.h|  2 ++
> >  gcc/t

[PATCH 4/4] Add aarch64-darwin support for off-stack trampolines

2021-11-13 Thread Maxim Blinov
Note: This patch is not yet ready for trunk as its dependent on some
patches that are not-yet-upstream, however it serves as motivation for
the previous patch(es) which are independent.



Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the aarch64-darwin platform. For this platform
--enable-off-stack-trampolines is enabled by default, and
-foff-stack-trampolines is enabled by default if the host MacOS
operating system version is 11.x or greater.

Co-authored-by: Andrew Burgess 

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c (allocate_trampoline_page):
Request for MAP_JIT in the case of __APPLE__.
Provide __APPLE__ variant of aarch64_trampoline_insns that uses
x16 as the chain pointer.
(__builtin_nested_func_ptr_created): Call
pthread_jit_write_protect_np() to toggle read/write permission on
page.
* config.host (aarch64*-*darwin* | arm64*-*darwin*): Handle
--enable-off-stack-trampolines.
* configure.ac (--enable-off-stack-trampolines): Permit setting
for target aarch64*-*darwin* | arm64*-*darwin*, and set default to
enabled.
* configure: Regenerate.
---
 gcc/config.gcc  |  7 +
 libgcc/config.host  |  4 +++
 libgcc/config/aarch64/heap-trampoline.c | 36 +
 libgcc/configure|  6 +
 libgcc/configure.ac |  6 +
 5 files changed, 59 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 031be563c5d..c13f7629d44 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1072,6 +1072,13 @@ esac
 
 # Figure out if we need to enable -foff-stack-trampolines by default.
 case ${target} in
+aarch64*-*darwin* | arm64*-*darwin*)
+  if test ${macos_maj} = 11 || test ${macos_maj} = 12; then
+tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=1"
+  else
+tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
+  fi
+  ;;
 *)
   tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
   ;;
diff --git a/libgcc/config.host b/libgcc/config.host
index d1a491d27e7..3c536b0928a 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -414,6 +414,10 @@ aarch64*-*darwin* | arm64*-*darwin* )
tmake_file="${tmake_file} t-crtfm"
# No soft float for now because our long double is DF not TF.
md_unwind_header=aarch64/aarch64-unwind.h
+   if test x$off_stack_trampolines = xyes; then
+   extra_parts="$extra_parts heap-trampoline.o"
+   tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
+   fi
;;
 aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index 721a2bed400..6994602beaf 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -5,6 +5,9 @@
 #include 
 #include 
 
+/* For pthread_jit_write_protect_np */
+#include 
+
 void *allocate_trampoline_page (void);
 int get_trampolines_per_page (void);
 struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
@@ -43,8 +46,15 @@ allocate_trampoline_page (void)
 {
   void *page;
 
+#if defined(__gnu_linux__)
   page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
   MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif defined(__APPLE__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+  MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+#else
+  page = MAP_FAILED;
+#endif
 
   return page;
 }
@@ -67,6 +77,7 @@ allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
   return p;
 }
 
+#if defined(__gnu_linux__)
 static const uint32_t aarch64_trampoline_insns[] = {
   0xd503245f, /* hint34 */
   0x58b1, /* ldr x17, .+20 */
@@ -76,6 +87,20 @@ static const uint32_t aarch64_trampoline_insns[] = {
   0xd5033fdf /* isb */
 };
 
+#elif defined(__APPLE__)
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint34 */
+  0x58b1, /* ldr x17, .+20 */
+  0x58d0, /* ldr x16, .+24 */
+  0xd61f0220, /* br  x17 */
+  0xd5033f9f, /* dsb sy */
+  0xd5033fdf /* isb */
+};
+
+#else
+#error "Unsupported AArch64 platform for heap trampolines"
+#endif
+
 void
 __builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
 {
@@ -99,11 +124,22 @@ __builtin_nested_func_ptr_created (void *chain, void 
*func, void **dst)
 = _ctrl_curr->trampolines[get_trampolines_per_page ()
- tramp_ctrl_curr->free_trampolines];
 
+#if defined(__APPLE__)
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+ 
https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon)
 */
+  pthread_jit_write_protect_np (0);
+#endif
+
   memcpy (trampoline->insns, aarch64_trampoline_insns,
  sizeof(aarch64_trampoline_insns));
   trampoline->func_ptr = func;

[PATCH 1/4] Generate off-stack nested function trampolines

2021-11-13 Thread Maxim Blinov
Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines is
to be provided in libgcc, and is by-default _not_ compiled in unless
the target specifically requires it, or you manually provide
--enable-off-stack-trampolines when configuring gcc/libgcc.

The gcc flag -foff-stack-trampolines controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-authored-by: Andrew Burgess 

gcc/ChangeLog:

* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
(BUILT_IN_NESTED_PTR_DELETED): Ditto.
* common.opt (foff-stack-trampolines): Add flag to control
generation of heap-based trampoline instantiation.
* tree-nested.c (convert_tramp_reference_op): Don't bother calling
__builtin_adjust_trampoline for the off-stack case.
(finalize_nesting_tree_1): Emit calls to
__builtin_nested_...{created,deleted} if we're generating with
-foff-stack-trampolines.
* tree.c (build_common_builtin_nodes): Build
__builtin_nested_...{created,deleted}.
* dov/invoke.texi (-foff-stack-trampolines): Document.

libgcc/ChangeLog:

* configure.ac: Add configure parameter
--enable-off-stack-trampolines, and do error checking if we've
trying to enable off-stack trampolines for a platform that doesn't
provide any such implementation.
* configure: Regenerate.
* libgcc-std.ver.in: Ditto.
* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
(__builtin_nested_func_ptr_deleted): Ditto.
---
 gcc/builtins.def |   2 +
 gcc/common.opt   |   4 ++
 gcc/config.gcc   |   7 +++
 gcc/doc/invoke.texi  |  14 +
 gcc/tree-nested.c| 121 +--
 gcc/tree.c   |  17 ++
 libgcc/configure |  26 +
 libgcc/configure.ac  |  17 ++
 libgcc/libgcc-std.ver.in |   3 +
 libgcc/libgcc2.h |   3 +
 10 files changed, 197 insertions(+), 17 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 45a09b4d42d..90a94a6dd0f 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -950,6 +950,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, 
"__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
"__builtin_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
"__builtin_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/common.opt b/gcc/common.opt
index de9b848eda5..a97aeeb2165 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2149,6 +2149,10 @@ foffload-abi=
 Common Joined RejectNegative Enum(offload_abi)
 -foffload-abi=[lp64|ilp32] Set the ABI to use in an offload compiler.
 
+foff-stack-trampolines
+Common RejectNegative Var(flag_off_stack_trampolines) 
Init(OFF_STACK_TRAMPOLINES_INIT)
+Generate trampolines in executable memory rather than executable stack.
+
 Enum
 Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs)
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index edd12655c4a..c479aa4cc44 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1070,6 +1070,13 @@ case ${target} in
   ;;
 esac
 
+# Figure out if we need to enable -foff-stack-trampolines by default.
+case ${target} in
+*)
+  tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
+  ;;
+esac
+
 case ${target} in
 aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
tm_file="${tm_file} dbxelf.h elfos.h newlib-stdint.h"
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2aba4c70b44..a5db65f8721 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -660,6 +660,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-fcall-saved-@var{reg}  -fcall-used-@var{reg} @gol
 -ffixed-@var{reg}  -fexceptions @gol
 -fnon-call-exceptions  

[PATCH 2/4] Add x86_64-linux support for off-stack trampolines

2021-11-13 Thread Maxim Blinov
Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the x86_64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-off-stack-trampolines) and
gcc (-foff-stack-trampolines) in supporting off-stack trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

Co-authored-by: Andrew Burgess 

libgcc/ChangeLog:

* config/i386/heap-trampoline.c: New file: Implement off-stack
trampolines for x86_64.
* config/i386/t-heap-trampoline: Add rule to build
config/i386/heap-trampoline.c
* config.host (x86_64-*-linux*): Handle
--enable-off-stack-trampolines.
* configure.ac (--enable-off-stack-trampolines): Permit setting
for target x86_64-*-linux*.
* configure: Regenerate.
---
 libgcc/config.host   |   4 +
 libgcc/config/i386/heap-trampoline.c | 143 +++
 libgcc/config/i386/t-heap-trampoline |  21 
 libgcc/configure |   3 +
 libgcc/configure.ac  |   3 +
 5 files changed, 174 insertions(+)
 create mode 100644 libgcc/config/i386/heap-trampoline.c
 create mode 100644 libgcc/config/i386/t-heap-trampoline

diff --git a/libgcc/config.host b/libgcc/config.host
index 168535b1780..163cd4c4161 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -753,6 +753,10 @@ x86_64-*-linux*)
tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff 
t-dfprules"
tm_file="${tm_file} i386/elf-lib.h"
md_unwind_header=i386/linux-unwind.h
+   if test x$off_stack_trampolines = xyes; then
+   extra_parts="${extra_parts} heap-trampoline.o"
+   tmake_file="${tmake_file} i386/t-heap-trampoline"
+   fi
;;
 x86_64-*-kfreebsd*-gnu)
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o"
diff --git a/libgcc/config/i386/heap-trampoline.c 
b/libgcc/config/i386/heap-trampoline.c
new file mode 100644
index 000..6c202660c35
--- /dev/null
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -0,0 +1,143 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+struct tramp_ctrl_data;
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  union ix86_trampoline *trampolines;
+};
+
+static const uint8_t trampoline_insns[] = {
+  /* movabs $,%r11  */
+  0x49, 0xbb,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* movabs $,%r10  */
+  0x49, 0xba,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* rex.WB jmpq *%r11  */
+  0x41, 0xff, 0xe3
+};
+
+union ix86_trampoline {
+  uint8_t insns[sizeof(trampoline_insns)];
+
+  struct __attribute__((packed)) fields {
+uint8_t insn_0[2];
+void *func_ptr;
+uint8_t insn_1[2];
+void *chain_ptr;
+uint8_t insn_2[3];
+  } fields;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(union ix86_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+  MAP_ANON | MAP_PRIVATE, 0, 0);
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+{
+  tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+  if (tramp_ctrl_curr == NULL)
+   abort ();
+}
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+{
+  void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+  if (!tramp_ctrl)
+   abort ();
+
+  tramp_ctrl_curr = tramp_ctrl;
+}
+
+  union ix86_trampoline *trampoline
+= _ctrl_curr->trampolines[get_trampolines_per_page ()
+   - tramp_ctrl_curr->free_trampolines];
+
+  memcpy (trampoline->insns, trampoline_insns,
+ sizeof(trampoline_insns));
+  trampoline->fields.func_ptr = func;
+  trampoline->fields.chain_ptr = chain;
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+  ((void *)trampoline->insns + 
sizeof(trampoline->insns)));
+
+  

[PATCH 3/4] Add aarch64-linux support for off-stack trampolines

2021-11-13 Thread Maxim Blinov
Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the aarch64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-off-stack-trampolines) and
gcc (-foff-stack-trampolines) in supporting off-stack trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

Co-authored-by: Andrew Burgess 

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c: New file: Implement off-stack
trampolines for aarch64.
* config/aarch64/t-heap-trampoline: Add rule to build
config/aarch64/heap-trampoline.c
* config.host (aarch64-*-linux*): Handle
--enable-off-stack-trampolines.
* configure.ac (--enable-off-stack-trampolines): Permit setting
for target aarch64-*-linux*.
* configure: Regenerate.
---
 libgcc/config.host  |   4 +
 libgcc/config/aarch64/heap-trampoline.c | 133 
 libgcc/config/aarch64/t-heap-trampoline |  21 
 libgcc/configure|   3 +
 libgcc/configure.ac |   3 +
 5 files changed, 164 insertions(+)
 create mode 100644 libgcc/config/aarch64/heap-trampoline.c
 create mode 100644 libgcc/config/aarch64/t-heap-trampoline

diff --git a/libgcc/config.host b/libgcc/config.host
index 163cd4c4161..912477db7d9 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -388,6 +388,10 @@ aarch64*-*-linux*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+   if test x$off_stack_trampolines = xyes; then
+   extra_parts="$extra_parts heap-trampoline.o"
+   tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
+   fi
;;
 aarch64*-*-vxworks7*)
extra_parts="$extra_parts crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
new file mode 100644
index 000..721a2bed400
--- /dev/null
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -0,0 +1,133 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+struct tramp_ctrl_data;
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  struct aarch64_trampoline *trampolines;
+};
+
+struct aarch64_trampoline {
+  uint32_t insns[6];
+  void *func_ptr;
+  void *chain_ptr;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(struct aarch64_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+  MAP_ANON | MAP_PRIVATE, 0, 0);
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint34 */
+  0x58b1, /* ldr x17, .+20 */
+  0x58d2, /* ldr x18, .+24 */
+  0xd61f0220, /* br  x17 */
+  0xd5033f9f, /* dsb sy */
+  0xd5033fdf /* isb */
+};
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+{
+  tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+  if (tramp_ctrl_curr == NULL)
+   abort ();
+}
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+{
+  void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+  if (!tramp_ctrl)
+   abort ();
+
+  tramp_ctrl_curr = tramp_ctrl;
+}
+
+  struct aarch64_trampoline *trampoline
+= _ctrl_curr->trampolines[get_trampolines_per_page ()
+   - tramp_ctrl_curr->free_trampolines];
+
+  memcpy (trampoline->insns, aarch64_trampoline_insns,
+ sizeof(aarch64_trampoline_insns));
+  trampoline->func_ptr = func;
+  trampoline->chain_ptr = chain;
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+  ((void *)trampoline->insns + 
sizeof(trampoline->insns)));
+
+  *dst = >insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+abort ();
+
+  

[PATCH 2/2] Implement TARGET_..._CA target hooks for AArch64 Darwin

2021-11-13 Thread Maxim Blinov
Note: This patch is not yet ready for trunk as its dependent on some
patches that are not-yet-upstream, however it serves as motivation for
the previous patch(es) which are independent.



The AArch64 Darwin platform requires that named stack arguments are
passed naturally-aligned, while variadic stack arguments are passed on
word boundaries. Use the TARGET_FUNCTION_ARG_BOUNDARY_CA and
TARGET_FUNCTION_ARG_ROUND_BOUNDARY_CA target hooks to let the backend
correctly layout stack parameters.

gcc/ChangeLog:

* config.gcc: Enable -fstack-use-cumulative-args by default if the
host platform is MacOS 11.x or 12.x and we're on AArch64.

gcc/config/aarch64/ChangeLog:

* aarch64-protos.h (aarch64_init_cumulative_incoming_args):
Declare.
* aarch64.c (aarch64_init_cumulative_args): Initialize
`darwinpcs_n_named` (Total number of named parameters) and
`darwinpcs_n_args_processed` (Total number of parameters we
have processed, including variadic if any.)
(aarch64_init_cumulative_incoming_args): Implement the
INIT_CUMULATIVE_INCOMING_ARGS macro in order to capture
information on the number of named parameters for the current
function.
(aarch64_function_arg_advance): Increment
`darwinpcs_n_args_processed` each time we layout a function
parameter.
(aarch64_function_arg_boundary_ca): Implement
TARGET_FUNCTION_ARG_BOUNDARY_CA and
TARGET_FUNCTION_ARG_ROUND_BOUNDARY_CA to layout args based on
whether we're a named parameter or not.
(aarch64_function_arg_round_boundary_ca): Ditto.
(TARGET_FUNCTION_ARG_BOUNDARY_CA): Define.
(TARGET_FUNCTION_ARG_ROUND_BOUNDARY_CA): Ditto.
* aarch64.h (CUMULATIVE_ARGS): Add `darwinpcs_n_named` and
`darwinpcs_n_args_processed`.
(INIT_CUMULATIVE_INCOMING_ARGS): Define.
---
 gcc/config.gcc  |  7 
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64.c| 56 +
 gcc/config/aarch64/aarch64.h|  5 +++
 4 files changed, 69 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e12a9f042d0..626ba089c07 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1072,6 +1072,13 @@ esac
 
 # Figure out if we need to enable -foff-stack-trampolines by default.
 case ${target} in
+aarch64*-*darwin* | arm64*-*darwin*)
+  if test ${macos_maj} = 11 || test ${macos_maj} = 12; then
+tm_defines="$tm_defines STACK_USE_CUMULATIVE_ARGS_INIT=1"
+  else
+tm_defines="$tm_defines STACK_USE_CUMULATIVE_ARGS_INIT=0"
+  fi
+  ;;
 *)
   tm_defines="$tm_defines STACK_USE_CUMULATIVE_ARGS_INIT=0"
   ;;
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a204647241e..cdc51fce906 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -896,6 +896,7 @@ void aarch64_expand_vector_init (rtx, rtx);
 void aarch64_sve_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
   const_tree, unsigned, bool = false);
+void aarch64_init_cumulative_incoming_args (CUMULATIVE_ARGS *, const_tree, 
rtx);
 void aarch64_init_expanders (void);
 void aarch64_init_simd_builtins (void);
 void aarch64_emit_call_insn (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 38b3f1eab89..70c2336ab3a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7042,6 +7042,8 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
   pcum->darwinpcs_stack_bytes = 0;
   pcum->darwinpcs_sub_word_offset = 0;
   pcum->darwinpcs_sub_word_pos = 0;
+  pcum->darwinpcs_n_named = n_named;
+  pcum->darwinpcs_n_args_processed = 0;
   pcum->silent_p = silent_p;
   pcum->aapcs_vfp_rmode = VOIDmode;
 
@@ -7072,6 +7074,20 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
 }
 }
 
+void
+aarch64_init_cumulative_incoming_args (CUMULATIVE_ARGS *pcum,
+  const_tree fntype,
+  rtx libname ATTRIBUTE_UNUSED)
+{
+#if !TARGET_MACHO
+  INIT_CUMULATIVE_ARGS (*pcum, fntype, libname, current_function_decl, -1);
+#else
+  int n_named_args = (list_length (TYPE_ARG_TYPES (fntype)));
+
+  aarch64_init_cumulative_args (pcum, fntype, libname, current_function_decl, 
n_named_args);
+#endif
+}
+
 static void
 aarch64_function_arg_advance (cumulative_args_t pcum_v,
  const function_arg_info )
@@ -7092,6 +7108,7 @@ aarch64_function_arg_advance (cumulative_args_t pcum_v,
   pcum->aapcs_stack_size += pcum->aapcs_stack_words;
   pcum->aapcs_stack_words = 0;
   pcum->aapcs_reg = NULL_RTX;
+  pcum->darwinpcs_n_args_processed++;
 }
 }
 
@@ -7147,6 +7164,19 @@ aarch64_function_arg_boundary (machine_mode mode, 
const_tree type)
 #endif
 }
 
+static unsigned int

[PATCH 1/2] Add cumulative_args_t variants of TARGET_FUNCTION_ROUND_BOUNDARY and friends

2021-11-13 Thread Maxim Blinov
The two target hooks responsible for informing GCC about stack
parameter alignment are `TARGET_FUNCTION_ARG_BOUNDARY` and
`TARGET_FUNCTION_ARG_ROUND_BOUNDARY`, which currently only consider
the tree and machine_mode of a specific given argument.

Create two new target hooks suffixed with '_CA', and pass in a third
`cumulative_args_t` parameter. This enables the backend to make
alignment decisions based on the context of the whole function rather
than individual parameters.

The orignal machine_mode/tree type macros are not removed - they are
called by the default implementations of `TARGET_...BOUNDARY_CA` and
`TARGET_...ROUND_BOUNDARY_CA`. This is done with the intetnion of
avoiding large mechanical modifications of nearly every backend in
GCC. There is also a new flag, -fstack-use-cumulative-args, which
provides a way to completely bypass the new `..._CA` macros. This
feature is intended for debugging GCC itself.

gcc/ChangeLog:

* calls.c (initialize_argument_information): Pass `args_so_far`.
* common.opt: New flag `-fstack-use-cumulative-args`.
* config.gcc: No platforms currently use ..._CA-hooks: Set
-fstack-use-cumulative-args to be off by default.
* target.h (cumulative_args_t): Move declaration from here, to...
* cumulative-args.h (cumulative_args_t): ...this new file. This is
to permit backends to include the declaration of cumulative_args_t
without dragging in circular dependencies.
* function.c (assign_parm_find_entry_rtl): Provide
cumulative_args_t to locate_and_pad_parm.
(gimplify_parameters): Ditto.
(locate_and_pad_parm): Conditionally call new hooks if we're
invoked with -fstack-use-cumulative-args.
* function.h: Include cumulative-args.h.
(locate_and_pad_parm): Add cumulative_args_t parameter.
* target.def (function_arg_boundary_ca): Add.
(function_arg_round_boundary_ca): Ditto.
* targhooks.c (default_function_arg_boundary_ca): Implement.
(default_function_arg_round_boundary_ca): Ditto.
* targhooks.h (default_function_arg_boundary_ca): Declare.
(default_function_arg_round_boundary_ca): Ditto.
* doc/invoke.texi (-fstack-use-cumulative-args): Document.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Ditto.
---
 gcc/calls.c   |  3 +++
 gcc/common.opt|  4 
 gcc/config.gcc|  7 +++
 gcc/cumulative-args.h | 20 
 gcc/doc/invoke.texi   | 12 
 gcc/doc/tm.texi   | 20 
 gcc/doc/tm.texi.in|  4 
 gcc/function.c| 25 +
 gcc/function.h|  2 ++
 gcc/target.def| 24 
 gcc/target.h  | 17 +
 gcc/targhooks.c   | 16 
 gcc/targhooks.h   |  6 ++
 13 files changed, 140 insertions(+), 20 deletions(-)
 create mode 100644 gcc/cumulative-args.h

diff --git a/gcc/calls.c b/gcc/calls.c
index 27b59f26ad3..cef612a6ef4 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1527,6 +1527,7 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
 #endif
 reg_parm_stack_space,
 args[i].pass_on_stack ? 0 : args[i].partial,
+args_so_far,
 fndecl, args_size, [i].locate);
 #ifdef BLOCK_REG_PADDING
   else
@@ -4205,6 +4206,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
   argvec[count].reg != 0,
 #endif
   reg_parm_stack_space, 0,
+  args_so_far,
   NULL_TREE, _size, [count].locate);
 
   if (argvec[count].reg == 0 || argvec[count].partial != 0
@@ -4296,6 +4298,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
   argvec[count].reg != 0,
 #endif
   reg_parm_stack_space, argvec[count].partial,
+  args_so_far,
   NULL_TREE, _size, [count].locate);
  args_size.constant += argvec[count].locate.size.constant;
  gcc_assert (!argvec[count].locate.size.var);
diff --git a/gcc/common.opt b/gcc/common.opt
index de9b848eda5..982417c1e39 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2705,6 +2705,10 @@ fstack-usage
 Common RejectNegative Var(flag_stack_usage)
 Output stack usage information on a per-function basis.
 
+fstack-use-cumulative-args
+Common RejectNegative Var(flag_stack_use_cumulative_args) 
Init(STACK_USE_CUMULATIVE_ARGS_INIT)
+Use cumulative args-based stack layout hooks.
+
 fstrength-reduce
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index edd12655c4a..046d691af56 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1070,6 +1070,13 @@ case ${target} in
   ;;
 esac
 
+# Figure 

[PATCH v2] analyzer: Define INCLUDE_UNIQUE_PTR

2021-09-14 Thread Maxim Blinov
Un-break the build for AArch64 Darwin, see PR bootstrap/102242.  Build
fails with log below:

```
In file included from 
../../../gcc-master-wip-apple-si/gcc/analyzer/engine.cc:69:
In file included from 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:678:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/stdexcept:239:5:
 error: no member named 'fancy_abort' in namespace 'std::__1'; did you mean 
simply 'fancy_abort'?
_VSTD::abort();
^~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:852:15:
 note: expanded from macro '_VSTD'

../../../gcc-master-wip-apple-si/gcc/system.h:777:13: note: 'fancy_abort' 
declared here
extern void fancy_abort (const char *, int, const char *)
^
```

Judging from the following comment in gcc/system.h, we just need to
define INCLUDE_UNIQUE_PTR since commit eafa9d96923 added the inclusion
of :

```
/* Some of the headers included by  can use "abort" within a
   namespace, e.g. "_VSTD::abort();", which fails after we use the
   preprocessor to redefine "abort" as "fancy_abort" below.
   Given that unique-ptr.h can use "free", we need to do this after "free"
   is declared but before "abort" is overridden.  */

```

gcc/analyzer/ChangeLog:
* engine.cc: Define INCLUDE_UNIQUE_PTR.
---
 gcc/analyzer/engine.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 24f0931197d..f21f8e5b78a 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_UNIQUE_PTR
 #include "system.h"
 #include "coretypes.h"
 #include "tree.h"
-- 
2.30.1 (Apple Git-130)



[PATCH] analyzer: Define INCLUDE_UNIQUE_PTR

2021-09-10 Thread Maxim Blinov
Un-break the build for AArch64 Darwin. Build currently fails with an
error very similar to pr82091:

```
In file included from 
../../../gcc-master-wip-apple-si/gcc/analyzer/engine.cc:69:
In file included from 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:678:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/stdexcept:239:5:
 error: no member named 'fancy_abort' in namespace 'std::__1'; did you mean 
simply 'fancy_abort'?
_VSTD::abort();
^~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:852:15:
 note: expanded from macro '_VSTD'

../../../gcc-master-wip-apple-si/gcc/system.h:777:13: note: 'fancy_abort' 
declared here
extern void fancy_abort (const char *, int, const char *)
^
```

Judging from the following comment in gcc/system.h, we just need to
define INCLUDE_UNIQUE_PTR since commit eafa9d96923 added the inclusion
of :

```
/* Some of the headers included by  can use "abort" within a
   namespace, e.g. "_VSTD::abort();", which fails after we use the
   preprocessor to redefine "abort" as "fancy_abort" below.
   Given that unique-ptr.h can use "free", we need to do this after "free"
   is declared but before "abort" is overridden.  */

```

gcc/analyzer/ChangeLog:
* engine.cc: Define INCLUDE_UNIQUE_PTR.
---
 gcc/analyzer/engine.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 24f0931197d..f21f8e5b78a 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_UNIQUE_PTR
 #include "system.h"
 #include "coretypes.h"
 #include "tree.h"
-- 
2.30.1 (Apple Git-130)



[PATCH v2] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Maxim Blinov
Hi Martin, thanks for reviewing.

> I would expect the missing quotes around the option to trigger
> a -Wformat-diag warning.  The %<%s%s> should also be flagged by
> the same warning.  Changing the format string as follows should
> avoid the warnings:
> 
>   error_at (loc, "%<-march=%s%>: unexpected ISA string at end: %qs"

I've made the corresponding changes.
tested with RUNTESTFLAGS="riscv.exp"

Thanks,
Maxim

gcc/ChangeLog:
2019-07-31  Maxim Blinov  

* common/config/riscv/riscv-common.c: Check -march string ends
    with null.

gcc/testsuite/ChangeLog:
2019-07-31  Maxim Blinov  

* gcc.target/riscv/attribute-10.c: New test.

---
 gcc/common/config/riscv/riscv-common.c| 7 +++
 gcc/testsuite/gcc.target/riscv/attribute-10.c | 6 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-10.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index eeb75717db0..a16d6c5b448 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -513,6 +513,13 @@ riscv_subset_list::parse (const char *arch, location_t loc)
   if (p == NULL)
 goto fail;
 
+  if (*p != '\0')
+{
+  error_at (loc, "%<-march=%s%>: unexpected ISA string at end: %qs",
+   arch, p);
+  goto fail;
+}
+
   return subset_list;
 
 fail:
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
new file mode 100644
index 000..dd817879a67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32im_s_sx_unexpectedstring -mabi=ilp32" } */
+int foo()
+{
+}
+/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */
-- 
2.20.1



[PATCH] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Maxim Blinov
This patch adds the same check that is present in binutils/bfd/elfxx-riscv.c.

Checks that we have reached the end of the string after all the
parsing routines have been run. Without this check, GCC silently
succeeds on erroneous input, and produces an assembly with a truncated
arch attribute.

tested with RUNTESTFLAGS="riscv.exp"

Thanks,
Maxim

gcc/ChangeLog:
2019-07-31  Maxim Blinov  

* common/config/riscv/riscv-common.c: Check -march string ends
with null.

gcc/testsuite/ChangeLog:
2019-07-31  Maxim Blinov  

* gcc.target/riscv/attribute-10.c: New test.

---
 gcc/common/config/riscv/riscv-common.c| 7 +++
 gcc/testsuite/gcc.target/riscv/attribute-10.c | 6 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-10.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index eeb75717db0..64a309241da 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -513,6 +513,13 @@ riscv_subset_list::parse (const char *arch, location_t loc)
   if (p == NULL)
 goto fail;
 
+  if (*p != '\0')
+{
+  error_at (loc, "-march=%s: unexpected ISA string at end: %<%s%>",
+   arch, p);
+  goto fail;
+}
+
   return subset_list;
 
 fail:
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
new file mode 100644
index 000..dd817879a67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32im_s_sx_unexpectedstring -mabi=ilp32" } */
+int foo()
+{
+}
+/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */
-- 
2.20.1