Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-02-13 Thread Steve Ellcey
On Wed, 2019-02-13 at 16:54 +, Szabolcs Nagy wrote:

> > +/* Table of machine attributes.  */
> > +static const struct attribute_spec aarch64_attribute_table[] =
> > +{
> > +  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> > +   affects_type_identity, handler, exclude } */
> > +  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL,
> > NULL },
> 
> i just noticed that "affects_type_identity" is set to false,
> so gcc accepts
> 
>   __attribute__((aarch64_vector_pcs)) void f(void);
>   void (*g)(void) = f;
> 
> without a warning (treats function types with different
> pcs as compatible)
> 
> i think we don't want to allow calls through the wrong
> pointer type, such assignment should be an error.

I agree.  I will submit a patch to change the affects_type_identity
flag and add a test for it.

Steve Ellcey
sell...@marvell.com


Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-02-13 Thread Szabolcs Nagy
On 08/11/2018 17:52, Steve Ellcey wrote:
> This is a resubmission of patch 1 to support the Aarch64 SIMD ABI [1] in
> GCC, it does not have any functional changes from the last submit.
> 
> The significant difference between the standard ARM ABI and the SIMD ABI
> is that in the normal ABI a callee saves only the lower 64 bits of registers
> V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
> V8-V23.
...
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1088,6 +1088,15 @@ static const struct processor *selected_tune;
>  /* The current tuning set.  */
>  struct tune_params aarch64_tune_params = generic_tunings;
>  
> +/* Table of machine attributes.  */
> +static const struct attribute_spec aarch64_attribute_table[] =
> +{
> +  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> +   affects_type_identity, handler, exclude } */
> +  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL, NULL },

i just noticed that "affects_type_identity" is set to false,
so gcc accepts

  __attribute__((aarch64_vector_pcs)) void f(void);
  void (*g)(void) = f;

without a warning (treats function types with different
pcs as compatible)

i think we don't want to allow calls through the wrong
pointer type, such assignment should be an error.


Re: [EXT] Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> On Wed, 2018-12-12 at 11:39 +, Richard Sandiford wrote:
>> 
>> Steve Ellcey  writes:
>> > On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
>> > > > +  (match_operand:TX 2 "register_operand" "w"))
>> > > > + (set (mem:TX (plus:P (match_dup 0)
>> > > > +  (match_operand:P 5 "const_int_operand"
>> > > > "n")))
>> > > > +  (match_operand:TX 3 "register_operand" "w"))])]
>> > > 
>> > > Think this last part should be:
>> > > 
>> > >  (set (mem:TX (plus:P (plus:P (match_dup 0)
>> > >   (match_dup 4))
>> > >   (match_operand:P 5 "const_int_operand"
>> > > "n")))
>> > >   (match_operand:TX 3 "register_operand" "w"))])]
>> > 
>> > I think you are right about this.  What I have for
>> > loadwb_pair_ matches what is there for
>> > loadwb_pair_.  If this one is wrong, then I assume
>> > the others are wrong too?  This won't make a practical difference since
>> > we call these with gen_loadwb_pair*_* calls and not via pattern
>> > recognition, but still they should be right.  Should I change them
>> > all?  I did not change this as part of this patch.
>> 
>> I think we should fix the new pattern, but I agree fixing the others
>> should be a separate patch.
>> 
>> Patch LGTM with that change.
>
> I am not sure this is right.  I created a patch (separate from any of
> the SIMD changes) to fix the storewb_pair_ and
> storewb_pair_ and when I try to build GCC with
> that change, gcc aborts while building libgcc.  I didn't think
> this change could affect the build but it appears to do so.

You're right, sorry, I'd misread the code.  Patch LGTM as posted.

Thanks,
Richard


Re: [EXT] Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Steve Ellcey
On Wed, 2018-12-12 at 11:39 +, Richard Sandiford wrote:
> 
> Steve Ellcey  writes:
> > On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
> > > > +  (match_operand:TX 2 "register_operand" "w"))
> > > > + (set (mem:TX (plus:P (match_dup 0)
> > > > +  (match_operand:P 5 "const_int_operand"
> > > > "n")))
> > > > +  (match_operand:TX 3 "register_operand" "w"))])]
> > > 
> > > Think this last part should be:
> > > 
> > >  (set (mem:TX (plus:P (plus:P (match_dup 0)
> > >   (match_dup 4))
> > >   (match_operand:P 5 "const_int_operand"
> > > "n")))
> > >   (match_operand:TX 3 "register_operand" "w"))])]
> > 
> > I think you are right about this.  What I have for
> > loadwb_pair_ matches what is there for
> > loadwb_pair_.  If this one is wrong, then I assume
> > the others are wrong too?  This won't make a practical difference since
> > we call these with gen_loadwb_pair*_* calls and not via pattern
> > recognition, but still they should be right.  Should I change them
> > all?  I did not change this as part of this patch.
> 
> I think we should fix the new pattern, but I agree fixing the others
> should be a separate patch.
> 
> Patch LGTM with that change.

I am not sure this is right.  I created a patch (separate from any of
the SIMD changes) to fix the storewb_pair_ and
storewb_pair_ and when I try to build GCC with
that change, gcc aborts while building libgcc.  I didn't think
this change could affect the build but it appears to do so.


/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'addtf3.o' failed
make[1]: *** [addtf3.o] Error 1
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'unwind-dw2.o' failed
make[1]: *** [unwind-dw2.o] Error 1
0x86bc7b dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1910
0x86acaf dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1616
0x86c13b dwarf2out_frame_debug
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2169
0x86c13b scan_insn_after
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2511


The patch I was trying was:


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6657316..3530dd4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1464,7 +1464,8 @@
  (set (mem:GPI (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPI 2 "register_operand" "r"))
- (set (mem:GPI (plus:P (match_dup 0)
+ (set (mem:GPI (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPI 3 "register_operand" "r"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"
@@ -1480,7 +1481,8 @@
  (set (mem:GPF (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPF 2 "register_operand" "w"))
- (set (mem:GPF (plus:P (match_dup 0)
+ (set (mem:GPF (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPF 3 "register_operand" "w"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"





Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
>> > +  (match_operand:TX 2 "register_operand" "w"))
>> > + (set (mem:TX (plus:P (match_dup 0)
>> > +  (match_operand:P 5 "const_int_operand" "n")))
>> > +  (match_operand:TX 3 "register_operand" "w"))])]
>> 
>> Think this last part should be:
>> 
>>  (set (mem:TX (plus:P (plus:P (match_dup 0)
>>   (match_dup 4))
>>   (match_operand:P 5 "const_int_operand"
>> "n")))
>>   (match_operand:TX 3 "register_operand" "w"))])]
>
> I think you are right about this.  What I have for
> loadwb_pair_ matches what is there for
> loadwb_pair_.  If this one is wrong, then I assume
> the others are wrong too?  This won't make a practical difference since
> we call these with gen_loadwb_pair*_* calls and not via pattern
> recognition, but still they should be right.  Should I change them
> all?  I did not change this as part of this patch.

I think we should fix the new pattern, but I agree fixing the others
should be a separate patch.

Patch LGTM with that change.

Thanks,
Richard


Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-11 Thread Steve Ellcey
On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
> 
> I'm not an expert on this stuff, but it looks like:
> 
>   struct cgraph_node *node = cgraph_node::get (fndecl);
>   return node && node->simdclone;
> 
> might work.  But in some ways it would be cleaner to add the
> aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
> so that the function type is "correct".

I have changed this to add the aarch64_vector_pcs attribute to clones.
To do this I had to tweak the targetm.simd_clone.adjust target
function, but I did not have to add a new target function.  That change
is part of Patch 2/4 and I will resubmit that after this email.  The
change in this patch is to just check for the aarch64_vector_pcs
attribute and not look at the simd attribute to determine the ABI.

> 
> > @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap
> > components, bool prologue_p)
> >mergeable with the current one into a pair.  */
> >if (!satisfies_constraint_Ump (mem)
> > || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> > +   || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P
> > (regno)))
> 
> Formatting nit: redundant brackets around FP_REGNUM_P (regno).

Fixed.

> > 
> > -(define_insn "simple_return"
> > +(define_expand "simple_return"
> > +  [(simple_return)]
> > +  "aarch64_use_simple_return_insn_p ()"
> > +  ""
> > +)
> > +
> > +(define_insn "*simple_return"
> >[(simple_return)]
> >""
> >"ret"
> 
> Can't you just change the condition on the existing define_insn,
> without turning it in a define_expand?  Worth a comment explaining
> why if not.

Yes, I am not sure why I did it this way (ciopying some other target
probably) but I got rid of the define_expand and changed the
define_insn and things seem to work fine.

> 
> > @@ -1487,6 +1538,23 @@
> >[(set_attr "type" "neon_store1_2reg")]
> >  )
> > 
> > +(define_insn "storewb_pair_"
> > +  [(parallel
> > +[(set (match_operand:P 0 "register_operand" "=")
> > +  (plus:P (match_operand:P 1 "register_operand" "0")
> > +  (match_operand:P 4 "aarch64_mem_pair_offset"
> > "n")))
> > + (set (mem:TX (plus:P (match_dup 0)
> > +  (match_dup 4)))
> 
> Should be indented under the (match_dup 0).

Fixed.

> 
> > +  (match_operand:TX 2 "register_operand" "w"))
> > + (set (mem:TX (plus:P (match_dup 0)
> > +  (match_operand:P 5 "const_int_operand" "n")))
> > +  (match_operand:TX 3 "register_operand" "w"))])]
> 
> Think this last part should be:
> 
>  (set (mem:TX (plus:P (plus:P (match_dup 0)
>   (match_dup 4))
>   (match_operand:P 5 "const_int_operand"
> "n")))
>   (match_operand:TX 3 "register_operand" "w"))])]

I think you are right about this.  What I have for
loadwb_pair_ matches what is there for
loadwb_pair_.  If this one is wrong, then I assume
the others are wrong too?  This won't make a practical difference since
we call these with gen_loadwb_pair*_* calls and not via pattern
recognition, but still they should be right.  Should I change them
all?  I did not change this as part of this patch.

> 
> > +  "TARGET_SIMD &&
> > +   INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE
> > (mode)"
> 
> && should be on the second line (which makes the line long enough to
> need breaking).

Fixed.

> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> > 
> 
> Comment doesn't match code: this is testing a normal PCS function.

Fixed.
> 
> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
> 
> This is a nice test, but I think it would also be good to have versions
> that don't clobber full register pairs.  E.g. one without q9 and another
> without q10 would test individual STR Qs.

I added two new tests for this, simd-abi-6.c and simd-abi-7.c.

Steve Ellcey
sell...@marvell.com



ChangeLog:

2018-12-11  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for 

Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-07 Thread Richard Sandiford
Sorry for the slow review.

Steve Ellcey  writes:
> @@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
> machine_mode mode)
>return false;
>  }
>  
> +/* Return true if this is a definition of a vectorized simd function.  */
> +
> +static bool
> +aarch64_simd_decl_p (tree fndecl)
> +{
> +  tree fntype;
> +
> +  if (fndecl == NULL)
> +return false;
> +  fntype = TREE_TYPE (fndecl);
> +  if (fntype == NULL)
> +return false;
> +
> +  /* All functions with the aarch64_vector_pcs attribute use the simd ABI.  
> */
> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != 
> NULL)
> +return true;
> +  /* Functions without the aarch64_vector_pcs or simd attribute never use the
> + simd ABI.  */
> +  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
> +return false;
> +  /* Functions with the simd attribute can generate three versions of a
> + function, a masked vector function, an unmasked vector function,
> + and a scalar version.  Only the vector versions use the simd ABI.  */
> +  return (VECTOR_TYPE_P (TREE_TYPE (fntype)));

Is this enough?  E.g.:

void __attribute__ ((simd)) f (int *x) { *x = 1; }

generates SIMD clones but doesn't have a vector return type.

I'm not an expert on this stuff, but it looks like:

  struct cgraph_node *node = cgraph_node::get (fndecl);
  return node && node->simdclone;

might work.  But in some ways it would be cleaner to add the
aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
so that the function type is "correct".

It might be more efficient to save aarch64_simd_decl_p in cfun->machine.

> @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap components, bool 
> prologue_p)
>mergeable with the current one into a pair.  */
>if (!satisfies_constraint_Ump (mem)
> || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> +   || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P (regno)))

Formatting nit: redundant brackets around FP_REGNUM_P (regno).

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 82af4d4..44261ee 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -724,7 +724,13 @@
>""
>  )
>  
> -(define_insn "simple_return"
> +(define_expand "simple_return"
> +  [(simple_return)]
> +  "aarch64_use_simple_return_insn_p ()"
> +  ""
> +)
> +
> +(define_insn "*simple_return"
>[(simple_return)]
>""
>"ret"

Can't you just change the condition on the existing define_insn,
without turning it in a define_expand?  Worth a comment explaining
why if not.

> @@ -1487,6 +1538,23 @@
>[(set_attr "type" "neon_store1_2reg")]
>  )
>  
> +(define_insn "storewb_pair_"
> +  [(parallel
> +[(set (match_operand:P 0 "register_operand" "=")
> +  (plus:P (match_operand:P 1 "register_operand" "0")
> +  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (mem:TX (plus:P (match_dup 0)
> +  (match_dup 4)))

Should be indented under the (match_dup 0).

> +  (match_operand:TX 2 "register_operand" "w"))
> + (set (mem:TX (plus:P (match_dup 0)
> +  (match_operand:P 5 "const_int_operand" "n")))
> +  (match_operand:TX 3 "register_operand" "w"))])]

Think this last part should be:

 (set (mem:TX (plus:P (plus:P (match_dup 0)
  (match_dup 4))
  (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:TX 3 "register_operand" "w"))])]

> +  "TARGET_SIMD &&
> +   INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
> (mode)"

&& should be on the second line (which makes the line long enough to
need breaking).

> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c 
> b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> index e69de29..bf6e64a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +
> +void
> +f (void)
> +{
> +  /* Clobber all fp/simd regs and verify that the correct ones are saved
> + and restored in the prologue and epilogue of a SIMD function. */
> +  __asm__ __volatile__ ("" :::  "q0",  "q1",  "q2",  "q3");
> +  __asm__ __volatile__ ("" :::  "q4",  "q5",  "q6",  "q7");
> +  __asm__ __volatile__ ("" :::  "q8",  "q9", "q10", "q11");
> +  __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15");
> +  __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19");
> +  __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23");
> +  __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27");
> +  __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31");
> +}
> +
> +/* { dg-final { scan-assembler {\sstp\td8, d9} } } */
> +/* { dg-final { scan-assembler {\sstp\td10, d11} } } */
> +/* { dg-final { scan-assembler {\sstp\td12, d13} } } */
> +/* { dg-final { scan-assembler 

[Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey
This is a resubmission of patch 1 to support the Aarch64 SIMD ABI [1] in
GCC, it does not have any functional changes from the last submit.

The significant difference between the standard ARM ABI and the SIMD ABI
is that in the normal ABI a callee saves only the lower 64 bits of registers
V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
V8-V23.

This patch checks for SIMD functions and saves the extra registers when
needed.  It does not change the caller behavour, so with just this patch
there may be values saved by both the caller and callee.  This is not
efficient, but it is correct code. Patches 3 and 4 will remove the extra
saves from the caller.

Steve Ellcey
sell...@cavium.com


2018-11-08  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_use_simple_return_insn_p): New function.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
(storewb_pair_): Ditto.


2018-11-08  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-5.c: Ditto.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1fe1a50..e1528a4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -467,6 +467,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -552,6 +553,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..b848c2a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1088,6 +1088,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  tree fntype;
+
+  if (fndecl == NULL)
+return false;
+  fntype = TREE_TYPE (fndecl);
+  if (fntype == NULL)
+return false;
+
+  /* All functions with the aarch64_vector_pcs attribute use the simd ABI.  */
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES